meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
general:bioseqanalysis:genesetanalysis:fcat [2024/02/15 09:36] – [Preparing the fCAT run] ingo | general:bioseqanalysis:genesetanalysis:fcat [2025/04/08 16:34] (current) – [fCAT analysis - Output visualization and interpretation] ingo | ||
---|---|---|---|
Line 10: | Line 10: | ||
</ | </ | ||
===== fCAT Core sets ===== | ===== fCAT Core sets ===== | ||
- | For the course, we have prepared | + | For the course, we have prepared |
==== Core set eukaryota ==== | ==== Core set eukaryota ==== | ||
Line 46: | Line 46: | ||
To solve this issue temporarily for the current shell, type< | To solve this issue temporarily for the current shell, type< | ||
< | < | ||
- | export COILSDIR=/ | + | export COILSDIR=/ |
</ | </ | ||
</ | </ | ||
Line 75: | Line 75: | ||
- run the fCAT analysis on the AWS with the following core set | - run the fCAT analysis on the AWS with the following core set | ||
- eukaryota | - eukaryota | ||
- | - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/ | + | - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/ |
<hidden Spoiler> | <hidden Spoiler> | ||
< | < | ||
Line 81: | Line 81: | ||
Mode 1: | Mode 1: | ||
genomeID similar dissimilar duplicated missing ignored total | genomeID similar dissimilar duplicated missing ignored total | ||
- | CRYPA@5807@240206 149 89 0 86 8 333 | + | CRYPA@5807@240206 149 89 0 86 8 332 |
Mode 2: | Mode 2: | ||
genomeID similar dissimilar duplicated missing ignored total | genomeID similar dissimilar duplicated missing ignored total | ||
- | CRYPA@5807@240206 141 97 0 86 8 333 | + | CRYPA@5807@240206 141 97 0 86 8 332 |
Mode 3: | Mode 3: | ||
genomeID similar dissimilar duplicated missing ignored total | genomeID similar dissimilar duplicated missing ignored total | ||
- | CRYPA@5807@240206 215 23 0 86 8 333 | + | CRYPA@5807@240206 215 23 0 86 8 332 |
Mode 4: | Mode 4: | ||
genomeID complete fragmented duplicated missing ignored total | genomeID complete fragmented duplicated missing ignored total | ||
- | CRYPA@5807@240206 217 21 0 86 8 333 | + | CRYPA@5807@240206 217 21 0 86 8 332 |
</ | </ | ||
Line 101: | Line 101: | ||
==== fCAT analysis - Output visualization and interpretation ==== | ==== fCAT analysis - Output visualization and interpretation ==== | ||
- | fCAT in combination with PhyloProfile allows to visualize and explore the results of the geneset completeness analysis. Follow the steps below to :!: {{ : | + | fCAT in combination with [[https:// |
<hidden PrecomputedFiles> | <hidden PrecomputedFiles> | ||
You will find all pre-computed fCAT results at ''/ | You will find all pre-computed fCAT results at ''/ | ||
</ | </ | ||
=== Downloading the data === | === Downloading the data === | ||
- | Download the following three files from the fcat output folder, e.g. '' | + | Download the following three files from the fcat output folder, e.g. '' |
- *.phyloprofile :!: These files contains the information about the presence/ | - *.phyloprofile :!: These files contains the information about the presence/ | ||
- *.mod.fa :!: This file contains the sequences of the orthologs in FASTA format | - *.mod.fa :!: This file contains the sequences of the orthologs in FASTA format | ||
- *.domains :!: This file contains the feature annotations for the core genes and the orthologs. You will need this for visualization of the feature architectures in PhyloProfile | - *.domains :!: This file contains the feature annotations for the core genes and the orthologs. You will need this for visualization of the feature architectures in PhyloProfile | ||
=== Opening the data in PhyloProfile === | === Opening the data in PhyloProfile === | ||
- | open the results for the // | + | open the results for the // |
- open a shell on your local computer | - open a shell on your local computer | ||
- startup //**R**// by typing R | - startup //**R**// by typing R | ||
Line 121: | Line 121: | ||
- upload the *domains file into the field at the lower left | - upload the *domains file into the field at the lower left | ||
- specify the origin of group IDs you are using | - specify the origin of group IDs you are using | ||
- | - Dataset // | ||
- Dataset // | - Dataset // | ||
- plot the results by clicking on ‘’Plot’’ | - plot the results by clicking on ‘’Plot’’ | ||
Line 133: | Line 132: | ||
- you can click on individual dots in the profile to gain more information about the detected orthologs. This gives you the option to look up sequence and orthogroup information in the public database((of course, this is possible only for groups and sequences for which a public database entry exists. Currently, we support OMA, orthoDB and NCBI)), and you can expect the domain architectures of the seed protein and the respective ortholog. | - you can click on individual dots in the profile to gain more information about the detected orthologs. This gives you the option to look up sequence and orthogroup information in the public database((of course, this is possible only for groups and sequences for which a public database entry exists. Currently, we support OMA, orthoDB and NCBI)), and you can expect the domain architectures of the seed protein and the respective ortholog. | ||
- check out the tab ‘’Functions’’ in the top menu. It gives you, among others, the option to cluster your phylogenetic profiles based on a variety of distance measures. Try this!((you will have to recheck the box ‘’Sort sequences by ID’’ in the PhyloProfile landing page, though)) | - check out the tab ‘’Functions’’ in the top menu. It gives you, among others, the option to cluster your phylogenetic profiles based on a variety of distance measures. Try this!((you will have to recheck the box ‘’Sort sequences by ID’’ in the PhyloProfile landing page, though)) | ||
- | * once your data is clustered, check the box ‘’apply clustering to main plot’’ and inspect the sorted phyloprofile | ||
- go back to the clustering function and use the mouse to select a clade in the clustering graph((you may have to increase its height)). You will find that the corresponding genes appear in a table to the right. Check the box ‘’Add to custom plot’’ and inspect your selection in tab custom profile | - go back to the clustering function and use the mouse to select a clade in the clustering graph((you may have to increase its height)). You will find that the corresponding genes appear in a table to the right. Check the box ‘’Add to custom plot’’ and inspect your selection in tab custom profile | ||
- redo the selection, this time selecting all genes from the // | - redo the selection, this time selecting all genes from the // | ||
+ | - if you do not find a single clade comprising all the genes that are missing in //C. parvum// do the following: | ||
+ | - Look for the file '' | ||
+ | - go to the tab '' | ||
+ | - find the button to upload a gene list for selecting a gene set of interest< | ||
+ | <figure PhyloProfile> | ||
+ | {{: | ||
+ | </ | ||
+ | </ | ||
+ | - upload the file '' | ||
+ | - select //Homo sapiens// as the taxon of interest((you can play around with the selection of taxa)) | ||
=== Download the data for the next analysis step === | === Download the data for the next analysis step === | ||
Download the information about the missing genes. We will need this for the last analysis | Download the information about the missing genes. We will need this for the last analysis |