meta data for this page
This is an old revision of the document!
Warning: Undefined array key 1 in /var/www/html/teaching/wiki/lib/plugins/fontsize2/syntax.php on line 49
taXaminer
Whole genome shotgun data are a great resource for reconstructing the genome of any target species. At the same time, there is a rich source of contaminations, i.e. reads from taxa other than the one you had in mind. Possible contaminations are
- taxa living in close association with your target species. Most prominent examples are bacteria of the gut or skin microbiome, or of symbiotic partners.
- reads representing the genome of the person who handled the data, i.e. human contamination
- contaminated reagents used for extracting or sequencing the DNA
- contamination in the sequencer
There are two main levels to detect such contaminations, either on the level of the genome assembly, e.g. with the help of BlobTools, or on the gene set level. We will focus on the latter, since it allows to investigate the nature of the contaminations.
We will use the software taXaminer (Fig. 1) to characterize the gene set of C. hominis. Next to performing the taxonomic assignment using Diamond searches against the NCBI nrProt database, taXaminer determines values for a number of other gene features, such as read coverage, standard deviation of read coverage from contig mean, gene length, position of the gene (terminal in contig or not), etc. taXaminer runs then a PCA on these feature vectors and returns, next to other information, a 3D plot of the taxonomically labeled PCA in html format. To make full use of the taXaminer output, we have developed the tX-dashboard that you can install locally on your computer.

taXaminer analysis
What you need
- The genome sequence in fasta format. We will be using
/home/ubuntu/Share/Assemblies/crypto_BCM2021_v2.fasta
- the annotation file in gff3. We will be using
/home/ubuntu/Share/Analysis/taxaminer/results/metaeuk/Crypto_Metaeuk.sorted.gff3
- optionally: the protein fasta file.
taXaminer will extract the protein sequences from the gff file if not provided.
- optionally: read mapping information: One BAM file per library
/home/ubuntu/fritz/sv-detection/short_reads/illumina_pairs.mapped.sort.bam
- optionally a local installation of the taxaminer-dashboard.
What you get
- a taxonomic assignment for each gene based on a modified version of the DIAMOND Last Common Ancestor algorithm1)
- a file with feature vectors for each gene in CSV
- a html-file with the PCA as a 3D plotly plot
- a file with the proteins encoded by the annotated genes
- a text file with the diamond hits
Running taXaminer
- Check for the presence of taXaminer on your system. To do so:
- activate the conda environment:
conda activate /home/ubuntu/miniconda3/envs/taxaminer
- issue the following command to test if you can run taxaminer:
taxaminer.run -h
- if it installed, proceed with the next steps
- if it is not:
- create a working directory for the analysis:
mkdir -p $HOME/Analysis/taxaminer
- change into the working directory:
cd $HOME/Analysis/taxaminer
- copy or soft-link the following files into the working directory
- The genome sequence in fasta file
- The genome annotation in gff3 format
- any read mapping information in BAM format. We will be using
/home/ubuntu/test_fritz/for_ingo/illumina_cryptov2.mapped.sort.bam
- we will be using the unref50 database for the Diamond search.
$HOME/Share/DBs/uniref50/db.dmnd
- edit the config-script according to your needs
- activate the taxaminer conda environment unless you have already done so3):
conda activate /home/ubuntu/anaconda3/envs/taxaminer
- run taXaminer by issuing the following command4)
taxaminer.run config.yml
Make sure that you are either in the directory where the config.yml is located, or provide the path.
Once, the taXaminer run has completed, you can download the information to your local computer. Then you can either open the 3D_plot.html directly in a web browser, or you use the taXaminer-dashboard to first import the output folder and then load the data.
Are the results in line with your expectations
Do you find anything suspicious?