meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
general:bioseqanalysis:genesetanalysis:fcat [2024/02/16 13:50] ingogeneral:bioseqanalysis:genesetanalysis:fcat [2025/04/08 16:34] (current) – [fCAT analysis - Output visualization and interpretation] ingo
Line 10: Line 10:
 </WRAP> </WRAP>
 ===== fCAT Core sets ===== ===== fCAT Core sets =====
-For the course, we have prepared two core sets of proteins that are prevalent((missing in less than 10% of the core taxa)) in **eukaryotes**, and in all **alveolates**. The latter set is obviously closer to //C. parvum//+For the course, we have prepared one core set of proteins that are prevalent((missing in less than 10% of the core taxa)) in **eukaryotes**
  
 ==== Core set eukaryota ==== ==== Core set eukaryota ====
Line 46: Line 46:
 To solve this issue temporarily for the current shell, type<WRAP> To solve this issue temporarily for the current shell, type<WRAP>
 <code> <code>
-export COILSDIR=/home/ubuntu/tools/annotation_tools/COILS2/coils+export COILSDIR=/home/ubuntu/Share/fdog/annotation_tools/COILS2/coils
 </code>To fix this change, add this line to your bash configuration file ''.bashrc''. It will become active upon the next login, or by typing ''source ~/.bashrc''. <wrap important>After 'sourcing' the ''.bashrc'', you will have to re-activate the conda environment: ''conda activate /home/ubuntu/anaconda3/envs/fdog''</wrap> </code>To fix this change, add this line to your bash configuration file ''.bashrc''. It will become active upon the next login, or by typing ''source ~/.bashrc''. <wrap important>After 'sourcing' the ''.bashrc'', you will have to re-activate the conda environment: ''conda activate /home/ubuntu/anaconda3/envs/fdog''</wrap>
 </WRAP> </WRAP>
Line 75: Line 75:
   - run the fCAT analysis on the AWS with the following core set   - run the fCAT analysis on the AWS with the following core set
     - eukaryota     - eukaryota
-  - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/Analysis/fcat)): <code>fcat --coreDir $HOME/Share/ProteinSets/coredir/ --coreSet eukaryota --refspecList "HOMSA@9606@2" --querySpecies Crypto_Metaeuk.fas --taxid 5807 --annoQuery $HOME/Share/Analysis/fCAT/fcatOutput/eukaryota/annotation_dir/CRYPA@5807@240206.json </code>:!: The analysis will run for about **380 sec** when using 4 cores.<WRAP>+  - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/Analysis/fcat)): <code>fcat --coreDir $HOME/Share/ProteinSets/coredir/ --coreSet eukaryota --refspecList "HOMSA@9606@2" --querySpecies Crypto_Metaeuk.fas --taxid 5807 --annoQuery $HOME/Analysis/fcat/annotation_dir/CRYPA_METAEUK\@5807\@240209.json </code>:!: The analysis will run for about **380 sec** when using 4 cores.<WRAP>
 <hidden Spoiler> <hidden Spoiler>
 <code> <code>
Line 81: Line 81:
 Mode 1: Mode 1:
 genomeID similar dissimilar duplicated missing ignored total genomeID similar dissimilar duplicated missing ignored total
-CRYPA@5807@240206 149 89 0 86 8 333+CRYPA@5807@240206 149 89 0 86 8 332
  
 Mode 2: Mode 2:
 genomeID similar dissimilar duplicated missing ignored total genomeID similar dissimilar duplicated missing ignored total
-CRYPA@5807@240206 141 97 0 86 8 333+CRYPA@5807@240206 141 97 0 86 8 332
  
 Mode 3: Mode 3:
 genomeID similar dissimilar duplicated missing ignored total genomeID similar dissimilar duplicated missing ignored total
-CRYPA@5807@240206 215 23 0 86 8 333+CRYPA@5807@240206 215 23 0 86 8 332
  
 Mode 4: Mode 4:
 genomeID complete fragmented duplicated missing ignored total genomeID complete fragmented duplicated missing ignored total
-CRYPA@5807@240206 217 21 0 86 8 333+CRYPA@5807@240206 217 21 0 86 8 332
  
 </code> </code>
Line 101: Line 101:
  
 ==== fCAT analysis - Output visualization and interpretation ==== ==== fCAT analysis - Output visualization and interpretation ====
-fCAT in combination with [[https://bioconductor.org/packages/release/bioc/html/PhyloProfile.html|PhyloProfile]] allows to visualize and explore the results of the geneset completeness analysis. Follow the steps below to :!:  {{ :physaliacg:2024:data:eukaryota.tar.gz |download the data}} to your local computer and :!: to open it in PhyloProfile.+fCAT in combination with [[https://bioconductor.org/packages/release/bioc/html/PhyloProfile.html|PhyloProfile]] allows to visualize and explore the results of the geneset completeness analysis. Follow the steps below to :!:  {{ :physaliacg:2025:data:CRYPA_Metaeuk-fcat.tar.gz|download the data}} to your local computer and :!: to open it in PhyloProfile.
 <hidden PrecomputedFiles> <hidden PrecomputedFiles>
 You will find all pre-computed fCAT results at ''/home/ubuntu/Share/Analysis/fCAT/fcatOutput/eukaryota''. Use these, if your analysis did not complete in time. You will find all pre-computed fCAT results at ''/home/ubuntu/Share/Analysis/fCAT/fcatOutput/eukaryota''. Use these, if your analysis did not complete in time.
 </hidden> </hidden>
 === Downloading the data === === Downloading the data ===
-Download the following three files from the fcat output folder, e.g. ''$HOME/Analyses/fcat/fcatOutput/eukaryota/CRYHO@237895@220307/phyloprofileOutput'' for the //eukaryota// dataset.+Download the following three files from the fcat output folder, e.g. ''$HOME/Analyses/fcat/fcatOutput/eukaryota/CRYPA@5807@250408/phyloprofileOutput'' for the //eukaryota// dataset.
   - *.phyloprofile :!: These files contains the information about the presence/absence of orthologs to the genes in your coreset together with the domain architecture similarity scores. You will find the information for both your taxon of interest **and** the core taxa. **It is the main input file for PhyloProfile**. :!: Choose the one that is represents the fCAT scoring mode you are interested in.   - *.phyloprofile :!: These files contains the information about the presence/absence of orthologs to the genes in your coreset together with the domain architecture similarity scores. You will find the information for both your taxon of interest **and** the core taxa. **It is the main input file for PhyloProfile**. :!: Choose the one that is represents the fCAT scoring mode you are interested in.
   - *.mod.fa :!: This file contains the sequences of the orthologs in FASTA format   - *.mod.fa :!: This file contains the sequences of the orthologs in FASTA format
Line 121: Line 121:
   - upload the *domains file into the field at the lower left   - upload the *domains file into the field at the lower left
   - specify the origin of group IDs you are using   - specify the origin of group IDs you are using
-    - Dataset //alveolata//: select **OMA** 
     - Dataset //eukaryota//: select **OrthoDB**     - Dataset //eukaryota//: select **OrthoDB**
   - plot the results by clicking on ‘’Plot’’   - plot the results by clicking on ‘’Plot’’
Line 136: Line 135:
   - redo the selection, this time selecting all genes from the //eukaryota// dataset that are present in all core species but are absent in your //C. parvum// gene set((This requires some experimenting to find the correct clade in the tree, unfortunately))   - redo the selection, this time selecting all genes from the //eukaryota// dataset that are present in all core species but are absent in your //C. parvum// gene set((This requires some experimenting to find the correct clade in the tree, unfortunately))
   - if you do not find a single clade comprising all the genes that are missing in //C. parvum// do the following:   - if you do not find a single clade comprising all the genes that are missing in //C. parvum// do the following:
-    - Look for the file ''missing.txt'' in your fCat output folder+    - Look for the file ''{{ :physaliacg:2025:data:crypa_metaeuk-fcat_missing.txt.gz |missing.txt}}'' in your fCat output folder
     - go to the tab ''Customised profile''     - go to the tab ''Customised profile''
     - find the button to upload a gene list for selecting a gene set of interest<WRAP>     - find the button to upload a gene list for selecting a gene set of interest<WRAP>