===== Module 1: Datasets ==== What species are available for my analysis and how are they related to each other? ---- **Assumption**: All species are related to each other, and this relationship can be represented by a tree ==== Analysis ==== === Task 1: What species are available for my analysis? === - Go to [[https://www.ncbi.nlm.nih.gov/datasets/|NCBI Datasets]] - Search for all available eukaryotic genomes - Filter for genomes "with [[https://www.ncbi.nlm.nih.gov/refseq/about/|RefSeq]] Annotation" - Add column "taxid" - Download table in CSV format === Task 2: How are they related to each other? === - Extract the information from the "taxid" column and save it in a ''.txt'' file - Go to [[https://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi|NCBI CommonTree]] and upload your ''.txt'' file (-> menu: **Add from file**) - Download the tree in Phylip format (-> menu: **save as -> phylip tree**) - Open the [[https://itol.embl.de/|iTOL web page]] - Upload the tree into iTOL and explore: - How many animals, how many fungi, how many plants are there? (Tip: the nodes in the tree will be named according to entries in [[https://www.ncbi.nlm.nih.gov/taxonomy|NCBI Taxonomy]]) - Highlight these three clades with different colors - Compare the circular vs rectangular representation of the tree === Summary and discussion === * Note down your observations and questions * Discuss with the group ---- [[:compgenomics|Back to main]]