Differences

This shows you the differences between two versions of the page.

--- general:bioseqanalysis:genesetanalysis:fcat [2024/02/16 13:50] – ingo
+++ general:bioseqanalysis:genesetanalysis:fcat [2025/04/08 16:34] (current) – [fCAT analysis - Output visualization and interpretation] ingo
@@ Line 10: / Line 10: @@
 </WRAP>
 ===== fCAT Core sets =====
-For the course, we have prepared two core sets of proteins that are prevalent((missing in less than 10% of the core taxa)) in **eukaryotes**, and in all **alveolates**. The latter set is obviously closer to //C. parvum//
+For the course, we have prepared one core set of proteins that are prevalent((missing in less than 10% of the core taxa)) in **eukaryotes**
 ==== Core set eukaryota ====
@@ Line 46: / Line 46: @@
 To solve this issue temporarily for the current shell, type<WRAP>
 <code>
-export COILSDIR=/home/ubuntu/tools/annotation_tools/COILS2/coils
+export COILSDIR=/home/ubuntu/Share/fdog/annotation_tools/COILS2/coils
 </code>To fix this change, add this line to your bash configuration file ''.bashrc''. It will become active upon the next login, or by typing ''source ~/.bashrc''. <wrap important>After 'sourcing' the ''.bashrc'', you will have to re-activate the conda environment: ''conda activate /home/ubuntu/anaconda3/envs/fdog''</wrap>
 </WRAP>
@@ Line 75: / Line 75: @@
   - run the fCAT analysis on the AWS with the following core set
     - eukaryota
-  - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/Analysis/fcat)): <code>fcat --coreDir $HOME/Share/ProteinSets/coredir/ --coreSet eukaryota --refspecList "HOMSA@9606@2" --querySpecies Crypto_Metaeuk.fas --taxid 5807 --annoQuery $HOME/Share/Analysis/fCAT/fcatOutput/eukaryota/annotation_dir/CRYPA@5807@240206.json </code>:!: The analysis will run for about **380 sec** when using 4 cores.<WRAP>
+  - for the **eukaryota core set** and the MetaEuk gene prediction on the CryPa_BCM2021a assembly, invoke the analysis with the following command((This assumes that you are in $HOME/Analysis/fcat)): <code>fcat --coreDir $HOME/Share/ProteinSets/coredir/ --coreSet eukaryota --refspecList "HOMSA@9606@2" --querySpecies Crypto_Metaeuk.fas --taxid 5807 --annoQuery $HOME/Analysis/fcat/annotation_dir/CRYPA_METAEUK\@5807\@240209.json </code>:!: The analysis will run for about **380 sec** when using 4 cores.<WRAP>
 <hidden Spoiler>
 <code>
@@ Line 81: / Line 81: @@
 Mode 1:
 genomeID	similar	dissimilar	duplicated	missing	ignored	total
-CRYPA@5807@240206	149	89	0	86	8	333
+CRYPA@5807@240206	149	89	0	86	8	332
 Mode 2:
 genomeID	similar	dissimilar	duplicated	missing	ignored	total
-CRYPA@5807@240206	141	97	0	86	8	333
+CRYPA@5807@240206	141	97	0	86	8	332
 Mode 3:
 genomeID	similar	dissimilar	duplicated	missing	ignored	total
-CRYPA@5807@240206	215	23	0	86	8	333
+CRYPA@5807@240206	215	23	0	86	8	332
 Mode 4:
 genomeID	complete	fragmented	duplicated	missing	ignored	total
-CRYPA@5807@240206	217	21	0	86	8	333
+CRYPA@5807@240206	217	21	0	86	8	332
 </code>
@@ Line 101: / Line 101: @@
 ==== fCAT analysis - Output visualization and interpretation ====
-fCAT in combination with [[https://bioconductor.org/packages/release/bioc/html/PhyloProfile.html|PhyloProfile]] allows to visualize and explore the results of the geneset completeness analysis. Follow the steps below to :!:  {{ :physaliacg:2024:data:eukaryota.tar.gz |download the data}} to your local computer and :!: to open it in PhyloProfile.
+fCAT in combination with [[https://bioconductor.org/packages/release/bioc/html/PhyloProfile.html|PhyloProfile]] allows to visualize and explore the results of the geneset completeness analysis. Follow the steps below to :!:  {{ :physaliacg:2025:data:CRYPA_Metaeuk-fcat.tar.gz|download the data}} to your local computer and :!: to open it in PhyloProfile.
 <hidden PrecomputedFiles>
 You will find all pre-computed fCAT results at ''/home/ubuntu/Share/Analysis/fCAT/fcatOutput/eukaryota''. Use these, if your analysis did not complete in time.
 </hidden>
 === Downloading the data ===
-Download the following three files from the fcat output folder, e.g. ''$HOME/Analyses/fcat/fcatOutput/eukaryota/CRYHO@237895@220307/phyloprofileOutput'' for the //eukaryota// dataset.
+Download the following three files from the fcat output folder, e.g. ''$HOME/Analyses/fcat/fcatOutput/eukaryota/CRYPA@5807@250408/phyloprofileOutput'' for the //eukaryota// dataset.
   - *.phyloprofile :!: These files contains the information about the presence/absence of orthologs to the genes in your coreset together with the domain architecture similarity scores. You will find the information for both your taxon of interest **and** the core taxa. **It is the main input file for PhyloProfile**. :!: Choose the one that is represents the fCAT scoring mode you are interested in.
   - *.mod.fa :!: This file contains the sequences of the orthologs in FASTA format
@@ Line 121: / Line 121: @@
   - upload the *domains file into the field at the lower left
   - specify the origin of group IDs you are using
-    - Dataset //alveolata//: select **OMA**
     - Dataset //eukaryota//: select **OrthoDB**
   - plot the results by clicking on ‘’Plot’’
@@ Line 136: / Line 135: @@
   - redo the selection, this time selecting all genes from the //eukaryota// dataset that are present in all core species but are absent in your //C. parvum// gene set((This requires some experimenting to find the correct clade in the tree, unfortunately))
   - if you do not find a single clade comprising all the genes that are missing in //C. parvum// do the following:
-    - Look for the file ''missing.txt'' in your fCat output folder
+    - Look for the file ''{{ :physaliacg:2025:data:crypa_metaeuk-fcat_missing.txt.gz |missing.txt}}'' in your fCat output folder
     - go to the tab ''Customised profile''
     - find the button to upload a gene list for selecting a gene set of interest<WRAP>

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

Differences