meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

general:legacy_backup [2023/10/16 11:21] – created felixgeneral:legacy_backup [2023/10/16 11:22] (current) felix
Line 116: Line 116:
  
 The taxon set we will use for this analysis consists of //L. hispanica//, //L. hispanica//, the close related species //U. muehlenbergii//, and the set of 78 QfO taxa (what is the “QfO”? Why do we work with these 78 taxa?).  The taxon set we will use for this analysis consists of //L. hispanica//, //L. hispanica//, the close related species //U. muehlenbergii//, and the set of 78 QfO taxa (what is the “QfO”? Why do we work with these 78 taxa?). 
-==== Install fDOG and PhyloProfile ==== 
  
-fDOG has been installed in the environment **/home/vinh/anaconda3/envs/ecoevo**. And PhyloProfile has probably already been installed after your analysis with fCAT. 
- 
-==== Perform fDOG search ==== 
-  - Collecting data 
-    - Seed proteins: Identify the 4 different types of **candidate //L. hispanica// proteins** and save __each sequence in a separate fasta file__ - __each group in their own directory__. They will be the seed genes/proteins for the fDOG search. :!: Hint: You need the protein file of //L. hispanica// from Funannotate to get the sequences! <code> 
-# please find the solution by yourself ;) You will need at least these functions: for, grep, cut and less / cat 
-</code> 
-    - Taxon data (you can check [[https://github.com/BIONF/fDOG/wiki|the wiki of fDOG]] to understand the data types as well as the data structure of fDOG): 
-      - Create your own folder for storing the data for fDOG <code>mkdir /your/working/directory/fdog_data</code> 
-      - Add //**U. muehlenbergii**// to that data folder <code> 
-fdog.addTaxon -f /share/gluster/CoreSets/lecanoromycetes/genome_dir/UmbMu@87280@1/UmbMu@87280@1.fa -i 87280 -v ecoevo22 -c -a --cpus 4 --replace -o /path/to/your/fdog_data 
-</code> 
-      - Check in your fDOG data folder (can be found at the end of ''fdog.addTaxon'' command above) if you can find **UMBMU@87280@ecoevo22** in **genome_dir** and **blast_dir** and if those folder are not empty. Then, add the FAS annotation for this species to the **weight_dir** folder <code> 
-mkdir /your/fdog_data/weight_dir 
-ln -s /share/gluster/CoreSets/lecanoromycetes/weight_dir/UmbMu@87280@1.json /your/fdog_data/weight_dir/UMBMU@87280@ecoevo22.json 
-</code> 
-      - Do the same for your own //**L. hispanica**// protein set, as well as the protein set of //**L. pustulata**// that you have downloaded from NCBI<code> 
-fdog.addTaxon -f /path/to/your/l_hispanica_proteins.fa -i 580046 -v ecoevo22 -c -a --cpus 4 --replace -o /path/to/your/fdog_data 
-ln -s /path/to/your/FAS/annotation/of/l_hispanica.json /path/to/your/fdog_data/weight_dir/LASHI@580046@ecoevo22.json 
-fdog.addTaxon -f /path/to/your/l_pustulata_proteins.fa -i 136370 -v ecoevo22 -c -a --cpus 4 --replace -o /path/to/your/fdog_data 
-ln -s /path/to/your/FAS/annotation/of/l_pustulata.json /path/to/your/fdog_data/weight_dir/LASPU@136370@ecoevo22.json 
-</code> 
-      - We also need the data for the 78 QfO taxa <code> 
-cd /your/fdog_data/ 
-ln -s /share/gluster/Projects/vinh/ecoevo/fdog_data/weight_dir/* weight_dir/ 
-ln -s /share/gluster/Projects/vinh/ecoevo/fdog_data/genome_dir/* genome_dir/ 
-ln -s /share/gluster/Projects/vinh/ecoevo/fdog_data/blast_dir/* blast_dir/ 
-</code> 
-  - Now you can run fDOG (using Slurm with 8 CPUs and 8GB memory). Please make sure that the input folder for ''fdogs.run'' contains nothing but only the respective fasta files of //L. hispanica// candidate genes you got from step 1a :!: Check the manual of fDOG for the meaning of each option you are using here :!::!: <code> 
-# add one command like this for each group of candidate genes to your SLURM script (or you can make 4 SLURM scripts, each for one seed directory) 
-fdogs.run --input /path/to/folder/containing/seed/proteins --jobName <your-job-name> --refspec LASHI@580046@ecoevo22 --blastpath /path/to/your/fdog/data/blast_dir --searchpath /path/to/your/fdog/data/genome_dir --weightpath /path/to/your/fdog/data/weight_dir --CorecheckCoorthologsRef --checkCoorthologsRef --force --cpu 8 
-</code> 
-  - The output for fDOG will be ''your-job-name.extended.fa'', ''your-job-name.phyloprofile'', ''your-job-name_forward.domains'' and ''your-job-name_reverse.domains''. Now you can study them using PhyloProfile tool 
-    - Upload *.phyloprofile and *_forward.domains into PhyloProfile 
-    - Select //Lasallia hispanica// as the reference taxon and plot the profiles 
-    - Apply clustering to bring the similar profiles together 
-    - You can use the function //"Gene age estimation"// to estimate the evolutionary age of your candidate proteins 
-    - In the //"Detailed plot"// you can find a link to UniProt database, where you can get the the characteristics of the proteins 
-    - Also from the //"Detailed plot"// you can generate the domain architecture plot to see the comparison between your protein of interest and the ortholog reference protein from //L. hispanica// 
- 
-<hidden OPTIONAL but highly recommended :-P> 
-//**This is an example for analysing one specific gene of interest**// 
-    - Searching for DHFR (Dihydrofolate reductase) in //L. pustulata// and //L. hispanica//. The story behinds this analysis is that, this protein [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186782/pdf/evaa049.pdf|has been observed]] to be present in //L. hispanica// but absent in //L. pustulata//. Here we are trying to check this finding using fDOG. 
-    - First, we need a DHFR protein as the seed sequence. Please search for this protein in human using the UniProt database. Get the UniProt ID of this protein and check if you have it in the human gene set of your fDOG data. The human gene set of fDOG can be found at ''/path/to/your/fdog/data/genome_dir/HUMAN@9606@3''. Get this protein sequence and save it as a file in FASTA format. It will be the input seed sequence for fDOG run. 
-    - Make sure that you have already added //L. pustulata// and //L. hispanica// to fDOG data set (check for LASHI@580046@ecoevo22 and LASPU@136370@ecoevo22 in /your/fdog_data/genome_dir, blast_dir and weight_dir) 
-    - Now we can run fDOG with the DHFR protein. **//NOTE//**//: for one seed sequence, we need ''fdog.run'', not ''fdogs.run''// <code> 
-fdog.run --seqFile /path/to/human/DHFR.fasta --seqName dhfr --refspec HUMAN@9606@3 --blastpath /path/to/your/fdog_data/blast_dir --searchpath /path/to/your/fdog_data/genome_dir  --weightpath /path/to/your/fdog_data/weight_dir --CorecheckCoorthologsRef --checkCoorthologsRef --force --cpu 8 
-</code> 
-    - Finally, check the phylogenetic profile for DHFR protein if you can find any ortholog in //L. hispanica// and in //L. pustulata// 
-</hidden>