===== Module 1: Datasets ====
What species are available for my analysis and how are they related to each other?
----
**Assumption**: All species are related to each other, and this relationship can be represented by a tree
==== Analysis ====
=== Task 1: What species are available for my analysis? ===
- Go to [[https://www.ncbi.nlm.nih.gov/datasets/|NCBI Datasets]]
- Search for all available eukaryotic genomes
- Filter for genomes "with [[https://www.ncbi.nlm.nih.gov/refseq/about/|RefSeq]] Annotation"
- Add column "taxid"
- Download table in CSV format
=== Task 2: How are they related to each other? ===
- Extract the information from the "taxid" column and save it in a ''.txt'' file
- Go to [[https://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi|NCBI CommonTree]] and upload your ''.txt'' file (-> menu: **Add from file**)
- Download the tree in Phylip format (-> menu: **save as -> phylip tree**)
- Open the [[https://itol.embl.de/|iTOL web page]]
- Upload the tree into iTOL and explore:
- How many animals, how many fungi, how many plants are there? (Tip: the nodes in the tree will be named according to entries in [[https://www.ncbi.nlm.nih.gov/taxonomy|NCBI Taxonomy]])
- Highlight these three clades with different colors
- Compare the circular vs rectangular representation of the tree
=== Summary and discussion ===
* Note down your observations and questions
* Discuss with the group
----
[[:compgenomics|Back to main]]