meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
asa:seminar:2025:flash [2025/07/13 16:54] ingoasa:seminar:2025:flash [2025/07/16 16:35] (current) ingo
Line 1: Line 1:
-====== Flash talks 2025 ======+====== Postersession 2025 ====== 
 +===== Abstract Book ===== 
 +<WRAP round box> 
 +{{ :asa:seminar:2025:abstract-book-deckblatt.png?400 |}} 
 +{{ :asa:seminar:2025:flash:asa-s_2025-abstract_book.pdf |}} 
 +</WRAP> 
 + 
 +===== Flash talks 2025 =====
 <WRAP round box>Informed and automated k-mer size selection for genome assembly. Chikhi et al. Bioinformatics 2014, 30(1):31-7<WRAP> <WRAP round box>Informed and automated k-mer size selection for genome assembly. Chikhi et al. Bioinformatics 2014, 30(1):31-7<WRAP>
 <hidden Abstract> <hidden Abstract>
Line 7: Line 14:
  
 **Quiz:** {{ :asa:seminar:2025:flash:p2_quizz.pdf |}} **Quiz:** {{ :asa:seminar:2025:flash:p2_quizz.pdf |}}
 +
  
 **Team:** Boudouassel, Ioan **Team:** Boudouassel, Ioan
 +{{ :general:images:prize.png?200 |1. prize}}
 </WRAP> </WRAP>
 <WRAP round box>Assembly of long, error-prone reads using repeat graphs. Kolmogorov et al. 2019 Nature Biotech 73:540-546<WRAP> <WRAP round box>Assembly of long, error-prone reads using repeat graphs. Kolmogorov et al. 2019 Nature Biotech 73:540-546<WRAP>
Line 17: Line 26:
 **Video** -> separate file **Video** -> separate file
  
-**Quiz:** -> separate file+**Quiz:** {{ :asa:seminar:2025:flash:p03_flashtalk_quiz.pdf |}}
  
 **Team:** Sarach, Deng **Team:** Sarach, Deng
 +
 </WRAP> </WRAP>
 <WRAP round box>BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Gabriel et al. 2024. Genome Res 34(5):769-777.<WRAP> <WRAP round box>BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Gabriel et al. 2024. Genome Res 34(5):769-777.<WRAP>
Line 46: Line 56:
 </hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=wiki:2020:paper:uapinyoying2020.genomeres.pdf|PDF]]</WRAP> </hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=wiki:2020:paper:uapinyoying2020.genomeres.pdf|PDF]]</WRAP>
 {{ :asa:seminar:2025:flash:flashtalk_paper6.mp4 |}} {{ :asa:seminar:2025:flash:flashtalk_paper6.mp4 |}}
- 
-**Quiz:** {{ :asa:seminar:2025:flash:p06_quizz.pdf |}} 
  
 **Team:** Le, Fischer **Team:** Le, Fischer
Line 63: Line 71:
 New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1,2,3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far. New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1,2,3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
 </hidden>Link to [[https://www.nature.com/articles/s41586-020-2871-y#Sec6|PDF]]</WRAP> </hidden>Link to [[https://www.nature.com/articles/s41586-020-2871-y#Sec6|PDF]]</WRAP>
-{{ :asa:seminar:2025:flash:p07_flash.mp4 |}}+{{ :asa:seminar:2025:flash:progressivecactus_flashtalk_bernshausen_chanthirakanthan.mp4 |}}
  
 **Team:** Bernshausen, Chanthirakanthan **Team:** Bernshausen, Chanthirakanthan
 +</WRAP>
 +<WRAP round box>Highly accurate protein structure prediction with AlphaFold Jumper et al. Nature volume 596, pages 583–589 (2021)<WRAP>
 +<hidden Abstract>
 +Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1,2,3,4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10,11,12,13,14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
 +</hidden>Link to [[https://www.nature.com/articles/s41586-021-03819-2|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p10_alphafold-flashtalk.mp4 |}}
 +
 +**Team:** Alkanat, Paraparan
 +</WRAP>
 +<WRAP round box>Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Steinegger and Salzberg Genome Biology Vol. 21 115 (2020)<WRAP>
 +<hidden Abstract>
 +Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to “complete” model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3): https://github.com/martin-steinegger/conterminator
 +</hidden>Link to [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02023-1|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p11_contamination-paper.mp4 |}}
 +
 +**Team:** Berger, Voss
 +</WRAP>
 +<WRAP round box>Clustering predicted structures at the scale of the known protein universe. Barrio-Hernandez et al. Nature 622: 637–645 (2023)<WRAP>
 +<hidden Abstract>
 +Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.
 +</hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=asa:seminar:papers:s41586-023-06510-w.pdf|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p013_batman-zeng.mp4 |}}
 +
 +**Team:** Batman, Zeng
 +</WRAP>
 +<WRAP round box>Fast and sensitive taxonomic assignment to metagenomic contigs. Mirdita et al. Bioinformatics 37(18):3029–3031 (2021)<WRAP>
 +<hidden Abstract>
 +**Summary**
 +MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments.
 +**Availability and implementation**
 +MMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com.
 +</hidden>Link to [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479651|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p14_qinashirov.mp4 |}}
 +
 +**Quiz:** {{ :asa:seminar:2025:flash:quiz_p14_2025.pdf |}}
 +**Team:** Ashirov, Qin
 +</WRAP>
 +<WRAP round box>Accurate proteome-wide missense variant effect prediction with AlphaMissense. Cheng et al. Science 381:eadg7492 (2023)<WRAP>
 +<hidden Abstract>
 +The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
 +</hidden>Link to [[https://www.science.org/doi/10.1126/science.adg7492|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p15_li-sahin_alphamissense.mp4 |}}
 +
 +**Team:** Li, Sahin
 </WRAP> </WRAP>