meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
general:softwares:fcat [2021/10/12 11:29] vinhgeneral:softwares:fcat [2022/11/16 14:22] (current) – [fCAT score modes] vinh
Line 1: Line 1:
 ====== fCAT ====== ====== fCAT ======
  
-fCAT is a **f**eature-aware **C**ompleteness **A**ssessment **T**ool, that helps to answer the question "How complete is my gene set?"fCAT checks for the presence of conserved genes (called the core genesfor a specific taxonomy cladewhere the target species belongs to. In addition to using the length criteria to classify the found orthologs (as same as [[https://busco.ezlab.org/|BUSCO]]), fCAT utilizes the domain architecture similarity [[https://github.com/BIONF/FAS|FAS scores]] to further validate the orthologs. fCAT outputs not only the summary result in tabular text file but also the phylogenetic profile of the core genes, which can be visualized and analyzed using the tool [[https://github.com/BIONF/PhyloProfile|PhyloProfile]].+One of the critical steps in a genome sequencing project is to assess the completeness of the predicted gene set. The standard workflow starts with the identification of a set of core genes for the taxonomic groupin which the target species belongs to. The fraction of missing core genes serves then as a proxy of the target gene set completeness
  
-<html><img width="744" alt="image" src="https://user-images.githubusercontent.com/19269760/127915398-94cc2b3a-d292-4b33-8241-6014f3010a4c.png"></html>+[[https://github.com/BIONF/fCAT|fCAT]] is a **f**eature-aware **C**ompleteness **A**ssessment **T**ool, that helps to answer the question "How complete is my gene set?". In particularly, fCAT checks for the presence of conserved genes (the core genes) of a specific taxonomy clade in the target gene set using feature-aware directed ortholog search ([[https://github.com/BIONF/fDOG|fDOG]]). In addition to the length criteria for classifying the found orthologs (as same as [[https://busco.ezlab.org/|BUSCO]]), fCAT utilizes the domain architecture similarity [[https://github.com/BIONF/FAS|FAS scores]] to further validate the orthologs. The later gives an alternative view on the accuracy of the target gene models, which shows how different the target orthologs in comparison to the core genes in their domain architecture.
  
 +fCAT outputs both the summary result in a tabular text file and the phylogenetic profile of the core genes, which can be visualized using the tool [[https://github.com/BIONF/PhyloProfile|PhyloProfile]]. By analyzing the profiles of the entire orthologous groups within a specific taxonomy clade, we can further identify and ultimately correct erroneous gene annotations.
 +
 +{{:ecoevo_molevol:wiki:mee:figures:fcat_workflow.png?600|}}
 ====== Table of Contents ====== ====== Table of Contents ======
  
Line 47: Line 50:
   * //phyloprofileOutput//: folder contains output phylogenetic profile data that can be used with [[https://github.com/BIONF/PhyloProfile|PhyloProfile tool]]   * //phyloprofileOutput//: folder contains output phylogenetic profile data that can be used with [[https://github.com/BIONF/PhyloProfile|PhyloProfile tool]]
  
-Besides, if you have already run //fCAT// for several query taxa with the same fCAT core set, you can find the merged phylogentic profiles for all of those taxa within the corresponding core set output (e.g. _/path/to/fcat/output/fcatOutput/eukaryota/*.phyloprofile_). +Besides, if you have already run //fCAT// for several query taxa with the same fCAT core set, you can find the merged phylogentic profiles for all of those taxa within the corresponding core set output (e.g. ///path/to/fcat/output/fcatOutput/eukaryota/*.phyloprofile//).
  
 +To learn how to interpret the phylogenetic profiles using PhyloProfile, please watch [[https://applbio.biologie.uni-frankfurt.de/download/eTransfer/How2Use-PhyloProfile.mp4|this video]].
  
 ====== fCAT score modes ====== ====== fCAT score modes ======
Line 55: Line 58:
 The table below explains how the //specific ortholog group cutoffs// for each fCAT core set were calculated, and which //value of the query ortholog// is used to assess its completeness, or more precisely, its functional equivalence to the ortholog group it belongs to. If the value of a query ortholog is //not less than// its ortholog group cutoff, that group will be evaluated as **similar** or **complete**. In case co-orthologs have been predicted, the assessment for the core group will be **duplicated**. Depending on the value of each single ortholog, a //duplicated// group can be seen as **duplicated (similar)** or **duplicated (dissimilar)** in the full report (e.g. *all_full.txt*). The table below explains how the //specific ortholog group cutoffs// for each fCAT core set were calculated, and which //value of the query ortholog// is used to assess its completeness, or more precisely, its functional equivalence to the ortholog group it belongs to. If the value of a query ortholog is //not less than// its ortholog group cutoff, that group will be evaluated as **similar** or **complete**. In case co-orthologs have been predicted, the assessment for the core group will be **duplicated**. Depending on the value of each single ortholog, a //duplicated// group can be seen as **duplicated (similar)** or **duplicated (dissimilar)** in the full report (e.g. *all_full.txt*).
  
-^ Score mode ^ Cutoff ^ Value ^+^ Score mode ^ Cutoff ^ Value used for comparing ^
 | Mode 1 - Strict mode | Mean of FAS scores between all core orthologs | Mean of FAS scores between query ortholog and all core proteins | | Mode 1 - Strict mode | Mean of FAS scores between all core orthologs | Mean of FAS scores between query ortholog and all core proteins |
-| Mode 2 - Selected mode | Mean of FAS scores between refspec and all other core orthologs | Mean of FAS scores between query ortholog and refspec protein | +| Mode 2 - Reference mode | Mean of FAS scores between refspec and all other core orthologs | Mean of FAS scores between query ortholog and refspec protein | 
-| Mode 3 - Relaxed mode | The lower bound of the confidence interval calculated by the distribution of all-vs-all FAS score in a core group | Mean of FAS scores between query ortholog and refspec protein |+| Mode 3 - Relaxed mode | The lower bound of the confidence interval calculated by the distribution of all-vs-all FAS score in a core group  | Mean of FAS scores between query ortholog and all core proteins |
 | Mode 4 - Length mode | Mean and standard deviation of all core protein lengths | Length of query ortholog | | Mode 4 - Length mode | Mean and standard deviation of all core protein lengths | Length of query ortholog |
  
-//Note**FAS scores** are bidirectional FAS scors; **core protein** or **core ortholog** is protein in the core ortholog groups; **query protein** or **query ortholog** is ortholog protein of query species; **refspec** is the specified reference species//+{{:ecoevo_molevol:wiki:mee:figures:fcat_scoremode.png?600|}}
  
-<html><img width="756" alt="image" src="https://user-images.githubusercontent.com/19269760/127915571-6fa4ff00-e5f9-4568-a2c5-520b9c830d25.png"></html>+//Note: **FAS scores** are bidirectional FAS scors; **core protein** or **core ortholog** is protein in the core ortholog groups; **query protein** or **query ortholog** is ortholog protein of query species; **refspec** is the specified reference species//
  
 ====== Contact ====== ====== Contact ======