===== Software list =====
Below you will find a diverse toolbox comprising numerous tools for the analysis of biological sequences. Not all of the tools will be used in the course, but maybe you are interested in taking a look around to see what is currently around.
^Type^Name^Description^%%GUI%%^Installation^%%URL%%|
^Read simulation|ART|Illumina, 454 and Solid read simulator|yes|[[https://bioconda.github.io/recipes/art/README.html|bioconda]]|[[https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm|Link]]|
^Read processing|FastQC|Generates summary statistics and overview information for DNA and RNA seq data|yes|[[https://anaconda.org/bioconda/fastqc|bioconda]]|[[https://www.bioinformatics.babraham.ac.uk/projects/fastqc/|Link]]|
| |SequelTools|Software package for the quality control, filtering and subread selection of PacBio reads|no|[[https://github.com/ISUgenomics/SequelTools#installation|manual]]|[[https://github.com/ISUgenomics/SequelTools|Link]]|
| |Trimmomatic|Adapter clipping and quality trimming of short read data|no|[[https://anaconda.org/bioconda/trimmomatic|bioconda/conda-forge]]|[[http://www.usadellab.org/cms/?page=trimmomatic|Link]]|
^Transcriptome assembly|Trinity|De novo reconstruction of transcriptomes from RNA-seq data|no|[[https://anaconda.org/bioconda/trinity|bioconda]]|[[https://github.com/trinityrnaseq/trinityrnaseq/wiki|Link]]|
^Genome assembly|Flye| Fast and accurate de novo assembler for single molecule sequencing reads (suitable for PacBio HiFi reads)|no|[[https://anaconda.org/bioconda/flye|bioconda]]|[[https://github.com/fenderglass/Flye|Link]]|
| |CANU |Assembler for High Noise / single molecule sequencing data (Overlap) |no |[[https://anaconda.org/bioconda/canu|bioconda]]|[[https://github.com/marbl/canu|Link]] |
| |Velvet |Sequence assembler for short reads (deBruijn graph)|no |[[https://anaconda.org/bioconda/velvet|bioconda]]|[[https://www.ebi.ac.uk/~zerbino/velvet/|Link]] |
| |SPades |Intended for both standard isolates and single-cell MDA bacteria assemblies (deBruijn Graph)|no |[[https://anaconda.org/bioconda/spades|bioconda]]|[[http://cab.spbu.ru/software/spades/|Link]]|
| |Mira |Whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (Overlap)|no |[[https://anaconda.org/bioconda/mira|bioconda]]|[[https://sourceforge.net/p/mira-assembler/wiki/Home/|Link]]|
^Database search|NCBI Blast|Heuristic for the rapid identification of significantly similar sequences in a sequence data base using local alignments|no|[[https://anaconda.org/bioconda/blast|bioconda]]|[[https://blast.ncbi.nlm.nih.gov/Blast.cgi|Link]]|
| |NCBI Legacy Blast|Heuristic for the rapid identification of significantly similar sequences in a sequence data base using local alignments. These C toolkit binaries are no longer maintained and supported|no|[[https://anaconda.org/biocore/blast-legacy|biocore]]|[[ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/|Link]]|
| |Diamond|Accelerated BLAST compatible local sequence aligner|no|[[https://anaconda.org/bioconda/diamond|bioconda]]|[[https://github.com/bbuchfink/diamond|Link]]|
^Gene prediction|maker|A comprehensive pipeline for genome annotation|no|[[https://anaconda.org/bioconda/maker|bioconda]]|[[http://www.yandell-lab.org/software/maker.html|Link]]|
| |funannotate|A pipeline for gene annotation in fungal genomes|no|[[https://anaconda.org/bioconda/funannotate|bioconda]]|[[https://github.com/nextgenusfs/funannotate|link]]|
| |Transdecoder|Identify candidate coding regions within transcript sequences|no|[[https://anaconda.org/bioconda/transdecoder|bioconda]]|[[https://github.com/TransDecoder/TransDecoder/wiki|Link]]|
^Gene set completeness|Busco|Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs|no|[[https://anaconda.org/bioconda/busco|bioconda]]|[[https://busco.ezlab.org/|Link]]|
| |fCAT|Gene set completeness assessment tool using domain-architecture aware targeted ortholog searches|no|[[https://github.com/BIONF/fcat#how-to-install|pip]] |[[https://github.com/BIONF/fcat|link]]|
^Repeat annotation|Repeat Masker|Smith-Waterman based identification and optional masking of repeats provided in a repeat database|no|[[https://anaconda.org/bioconda/repeatmasker|bioconda]]|[[http://www.repeatmasker.org/|Link]]|
^Genome visualization|JBrowse Desktop|Desktop version of JBrowse that does not need any web server configuration|yes|[[https://jbrowse.org/blog/|manually]]|[[http://gmod.org/wiki/JBrowse_Desktop|Link]]|
| |JBrowse|A browser based viewer for genomes and genome-wide annotation|yes|[[https://anaconda.org/bioconda/jbrowse|bioconda]]|[[https://jbrowse.org/blog/|Link]]|
| |IGV|Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations |yes|[[https://anaconda.org/bioconda/igv|bioconda]]|[[http://software.broadinstitute.org/software/igv/home|Link]]|
| |igvtools|The igvtools utility provides a set of tools for pre-processing data files.|no|[[https://anaconda.org/bioconda/igvtools|bioconda]]|[[https://software.broadinstitute.org/software/igv/igvtools|Link]]|
^Structural variant detection|Lumpy|A general probabilistic framework for structural variant discovery|no|[[https://anaconda.org/bioconda/lumpy-sv|bioconda]]|[[https://github.com/arq5x/lumpy-sv|Link]]|
| |Delly|A SV caller leveraging multiple signals|no|[[https://anaconda.org/bioconda/delly|bioconda]]|[[https://github.com/dellytools/delly|Link]]|
| |Manta|A SV caller leveraging multiple signals|no|[[https://anaconda.org/bioconda/manta|bioconda]]|[[https://github.com/Illumina/manta|Link]]|
^Structural variation comparison |SURVIOR|A tool kit to compare, merge, and generate stats over SVs vcf files|no|[[https://anaconda.org/bioconda/survivor|bioconda]]|[[https://github.com/fritzsedlazeck/SURVIVOR|Link]]|
^SAM/BAM manipulation |Samtools|A tool kit to operate with sam/bam files|no|[[https://anaconda.org/bioconda/samtools|bioconda]]|[[https://github.com/samtools/samtools|Link]]|
| |BCFTOOLS| BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF |no |[[https://anaconda.org/bioconda/bcftools|bioconda]]|[[https://github.com/samtools/bcftools|LINK]]|
^Read mapping |BWA|A method to align short reads|no|[[https://anaconda.org/bioconda/bwa|bioconda]]|[[https://github.com/lh3/bwa|Link]]|
| |Bowtie2|Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences|no|[[https://anaconda.org/bioconda/bowtie2|bioconda]]|[[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml|Link]]|
^SNP caller |xAtlas|A method to detect SNP + indels|no|[[https://anaconda.org/bioconda/xatlas|bioconda]]|[[https://github.com/jfarek/xatlas|Link]]|
^RNA-Seq mapping |STAR|A method to align RNA-Seq data|no|[[https://anaconda.org/bioconda/star|bioconda]]|[[https://github.com/alexdobin/STAR|Link]]|
| |Hisat2|A method to align RNA-Seq data|no|[[https://anaconda.org/bioconda/hisat2|bioconda]]|[[https://github.com/infphilo/hisat2|Link]]|
| |Kallisto|A method to align RNA-Seq data|no|[[https://anaconda.org/bioconda/kallisto|bioconda]]|[[https://pachterlab.github.io/kallisto/download|Link]]|
| |TopHat|A spliced read mapper for RNA-Seq |no|[[https://anaconda.org/bioconda/tophat|bioconda]]|[[http://ccb.jhu.edu/software/tophat/index.shtml|Link]]|
^Assembly evaluation|Quast |Quality Assessment Tool for Genome Assemblies |yes |[[https://anaconda.org/bioconda/quast|bioconda]] |[[http://quast.sourceforge.net/|Link]] |
| |RnaQuast|Quality evaluation tool for assembled transcripts| |[[https://anaconda.org/bioconda/rnaquast|bioconda]]|[[http://cab.spbu.ru/software/rnaquast/|Link]]|
^Taxonomic Assignment|Megan |interactive microbiome analysis tool|no |[[http://ab.inf.uni-tuebingen.de/software/megan6/|manually]] |[[http://ab.inf.uni-tuebingen.de/software/megan6/|Link]] |
^ |Krona Tools | Interaktive viewer for metagenome composition | yes | [[https://anaconda.org/bioconda/krona|bioconda]]|[[https://github.com/marbl/Krona/wiki|Link]] |
^Ortholog search|InParanoid|Pairwise ortholog search tool |no |[[http://software.sbc.su.se/cgi-bin/request.cgi?project=inparanoid|manually]] |[[http://software.sbc.su.se/cgi-bin/request.cgi?project=inparanoid|Link]] |
| |OMA |Ortholog matrix project | no |[[https://omabrowser.org/standalone/#downloads|manually]] |[[https://omabrowser.org/standalone/|Link]] |
| |fDOG |Targeted ortholog search tool |no |[[https://github.com/BIONF/fDOG|pip]] |[[https://github.com/BIONF/fDOG|Link]] |
^ Phylogenetic Profiling|PhyloProfile |A browser based tool for visualizing and exploring phylogenetic profiles |yes |[[https://github.com/BIONF/PhyloProfile|Bioconductor]] |[[https://github.com/BIONF/PhyloProfile|Link]] |
^Phylogeny reconstruction|RAxML |Phylogenetics - Randomized Axelerated Maximum Likelihood |no |[[https://anaconda.org/bioconda/raxml|bioconda]] |[[http://sco.h-its.org/exelixis/web/software/raxml/index.html|Link]] |
| |ProtTest3 |ProtTest is a bioinformatic tool for the selection of best-fit models of aminoacid replacement for the data at hand|yes|[[https://github.com/ddarriba/prottest3|manually]]|[[https://github.com/ddarriba/prottest3|Link]]|
^Tree visualization|FigTree|Graphical viewer of phylogenetic trees|yes|[[https://github.com/rambaut/figtree/|manually]]|[[https://github.com/rambaut/figtree/|Link]]|
| |iTOL|Interactive software for the visualisation and annotation of phylogenetic trees|yes|[[https://itol.embl.de|Web tool]]|[[https://itol.embl.de|Link]]|
| |matt|Interactive tree visualisation, modification and topology testing|yes|[[https://github.com/BIONF/matt#installation |pip]]|[[https://github.com/BIONF/matt|Link]]|
^Sequence alignment|Muscle|Multiple sequence alignment|no|[[https://anaconda.org/bioconda/muscle|bioconda]]|[[https://www.drive5.com/muscle/manual/index.html|Link]]|
| | | | | |
===== Additional installation information =====
==== RepeatMasker ====
Once you have installed the repeat masker, either directly or via the installation of the maker pipeline, you will need to install the [[https://www.girinst.org/server/RepBase/index.php|RepBase repeat library]]. To do so, identify the location of your RepeatMasker installation, download the current release of [[https://www.girinst.org/server/RepBase/index.php|RepBase repeat library]]
- note, you will have to complete a free registration before you can download the file. Move the archive file name //RepBaseRepeatMaskerEdition-20170127.tar.gz// into the RepeatMasker directory in~/anaconda/envs/compgen/share/RepeatMasker
and unpack it by typingtar -xzf RepBaseRepeatMaskerEdition-20170127.tar.gz
==== JBrowse ====
In case you want to use the [[http://gmod.org/wiki/JBrowse_Desktop|desktop version of JBrowse]] for your work, download the version that matches your operating system from the [[https://jbrowse.org/blog/|JBrowse web sites]] and follow the installation guidelines provided with the software.
===== Databases =====
**Databases and web tools**:
* NCBI - [[https://www.ncbi.nlm.nih.gov/|https://www.ncbi.nlm.nih.gov/]]
* KEGG - [[http://www.genome.jp/kegg/|http://www.genome.jp/kegg/]]
* ENSEMBL - [[http://www.ensembl.org/index.html|http://www.ensembl.org/index.html]]
* Uniprot - [[http://www.uniprot.org/|http://www.uniprot.org/]]
* PFAM - [[http://pfam.xfam.org/|http://pfam.xfam.org/]]
* InterPro - [[https://www.ebi.ac.uk/interpro/]]
* HMMER - [[http://hmmer.org/|http://hmmer.org/]]
* gNOME - [[http://ghubs.izn-ffm.intern:5000/g-nom/assemblies/list]] :!: This is an ongoing project, and we will be working with an alpha version of this web tool. The %%URL%% is reachable only from within our network.
* [[ecoevo_molevol:course_introduction|Back to EcoEvo course]]
* [[pbioc_basics:start|Back to PBioC course]]
* [[digikomp_bio:course_introduction|Back to DigiKomp course]]
* [[mbw_bioinf:mastermbw| Back to MBW course]]