meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
asa:seminar:2025:flash [2025/07/13 17:03] ingoasa:seminar:2025:flash [2025/07/16 16:35] (current) ingo
Line 1: Line 1:
-====== Flash talks 2025 ======+====== Postersession 2025 ====== 
 +===== Abstract Book ===== 
 +<WRAP round box> 
 +{{ :asa:seminar:2025:abstract-book-deckblatt.png?400 |}} 
 +{{ :asa:seminar:2025:flash:asa-s_2025-abstract_book.pdf |}} 
 +</WRAP> 
 + 
 +===== Flash talks 2025 =====
 <WRAP round box>Informed and automated k-mer size selection for genome assembly. Chikhi et al. Bioinformatics 2014, 30(1):31-7<WRAP> <WRAP round box>Informed and automated k-mer size selection for genome assembly. Chikhi et al. Bioinformatics 2014, 30(1):31-7<WRAP>
 <hidden Abstract> <hidden Abstract>
Line 7: Line 14:
  
 **Quiz:** {{ :asa:seminar:2025:flash:p2_quizz.pdf |}} **Quiz:** {{ :asa:seminar:2025:flash:p2_quizz.pdf |}}
 +
  
 **Team:** Boudouassel, Ioan **Team:** Boudouassel, Ioan
 +{{ :general:images:prize.png?200 |1. prize}}
 </WRAP> </WRAP>
 <WRAP round box>Assembly of long, error-prone reads using repeat graphs. Kolmogorov et al. 2019 Nature Biotech 73:540-546<WRAP> <WRAP round box>Assembly of long, error-prone reads using repeat graphs. Kolmogorov et al. 2019 Nature Biotech 73:540-546<WRAP>
Line 17: Line 26:
 **Video** -> separate file **Video** -> separate file
  
-**Quiz:** -> separate file+**Quiz:** {{ :asa:seminar:2025:flash:p03_flashtalk_quiz.pdf |}}
  
 **Team:** Sarach, Deng **Team:** Sarach, Deng
 +
 </WRAP> </WRAP>
 <WRAP round box>BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Gabriel et al. 2024. Genome Res 34(5):769-777.<WRAP> <WRAP round box>BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Gabriel et al. 2024. Genome Res 34(5):769-777.<WRAP>
Line 46: Line 56:
 </hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=wiki:2020:paper:uapinyoying2020.genomeres.pdf|PDF]]</WRAP> </hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=wiki:2020:paper:uapinyoying2020.genomeres.pdf|PDF]]</WRAP>
 {{ :asa:seminar:2025:flash:flashtalk_paper6.mp4 |}} {{ :asa:seminar:2025:flash:flashtalk_paper6.mp4 |}}
- 
-**Quiz:** {{ :asa:seminar:2025:flash:p06_quizz.pdf |}} 
  
 **Team:** Le, Fischer **Team:** Le, Fischer
Line 74: Line 82:
  
 **Team:** Alkanat, Paraparan **Team:** Alkanat, Paraparan
 +</WRAP>
 +<WRAP round box>Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Steinegger and Salzberg Genome Biology Vol. 21 115 (2020)<WRAP>
 +<hidden Abstract>
 +Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to “complete” model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3): https://github.com/martin-steinegger/conterminator
 +</hidden>Link to [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02023-1|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p11_contamination-paper.mp4 |}}
 +
 +**Team:** Berger, Voss
 +</WRAP>
 +<WRAP round box>Clustering predicted structures at the scale of the known protein universe. Barrio-Hernandez et al. Nature 622: 637–645 (2023)<WRAP>
 +<hidden Abstract>
 +Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.
 +</hidden>Link to [[https://applbio.biologie.uni-frankfurt.de/teaching/wiki/lib/exe/fetch.php?media=asa:seminar:papers:s41586-023-06510-w.pdf|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p013_batman-zeng.mp4 |}}
 +
 +**Team:** Batman, Zeng
 +</WRAP>
 +<WRAP round box>Fast and sensitive taxonomic assignment to metagenomic contigs. Mirdita et al. Bioinformatics 37(18):3029–3031 (2021)<WRAP>
 +<hidden Abstract>
 +**Summary**
 +MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments.
 +**Availability and implementation**
 +MMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com.
 +</hidden>Link to [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479651|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p14_qinashirov.mp4 |}}
 +
 +**Quiz:** {{ :asa:seminar:2025:flash:quiz_p14_2025.pdf |}}
 +**Team:** Ashirov, Qin
 +</WRAP>
 +<WRAP round box>Accurate proteome-wide missense variant effect prediction with AlphaMissense. Cheng et al. Science 381:eadg7492 (2023)<WRAP>
 +<hidden Abstract>
 +The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
 +</hidden>Link to [[https://www.science.org/doi/10.1126/science.adg7492|PDF]]</WRAP>
 +{{ :asa:seminar:2025:flash:p15_li-sahin_alphamissense.mp4 |}}
 +
 +**Team:** Li, Sahin
 </WRAP> </WRAP>