meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
general:bioseqanalysis:shotgunassembly [2021/10/19 21:46] – [Method selection] ingogeneral:bioseqanalysis:shotgunassembly [2021/10/19 21:48] (current) – [Method selection] ingo
Line 27: Line 27:
 Meanwhile a plethora of different WGS assemblers exist, and it is hard to decide a priori which assembler performs best for a given genome and WGS data set. However, determining how good an assembly is, can be very difficult and there’s even a competition – the Assemblathon – which intends to benchmark current state-of-the-art methods in genome assembly (Earl, et al. 2011; Bradnam, et al. 2013). Still the problem exists, to what extent the insights from these benchmarks can be generalized to any particular assembly problem. Given the complexity of the assembly problem, it is easily conceivable that an algorithm that performs non-optimal on any of the benchmark data sets happens to be superior for your particular assembly problem. It is, thus, that separate benchmarks are generated for particular subsets of genomes (e.g. Abbas, et al. 2014). As an alternative, Greshake et al. (2016) recently proposed the idea of simulated twin sets. The idea here is, to simulate a WGS read set that closely resembles that of the actual sequencing experiment, hence its name ‘twin set’. Assemblers can then be custom-benchmarked on the twin sets resulting in a more informed assembler choice.  Meanwhile a plethora of different WGS assemblers exist, and it is hard to decide a priori which assembler performs best for a given genome and WGS data set. However, determining how good an assembly is, can be very difficult and there’s even a competition – the Assemblathon – which intends to benchmark current state-of-the-art methods in genome assembly (Earl, et al. 2011; Bradnam, et al. 2013). Still the problem exists, to what extent the insights from these benchmarks can be generalized to any particular assembly problem. Given the complexity of the assembly problem, it is easily conceivable that an algorithm that performs non-optimal on any of the benchmark data sets happens to be superior for your particular assembly problem. It is, thus, that separate benchmarks are generated for particular subsets of genomes (e.g. Abbas, et al. 2014). As an alternative, Greshake et al. (2016) recently proposed the idea of simulated twin sets. The idea here is, to simulate a WGS read set that closely resembles that of the actual sequencing experiment, hence its name ‘twin set’. Assemblers can then be custom-benchmarked on the twin sets resulting in a more informed assembler choice. 
  
-In our exercises on de novo whole genome shotgun assembly, we will concentrate on SPades (Bankevich, et al. 2012). SPades constructs multi-sized de Bruijn graphs with different values for //k// But of course, you are free to further explore the methods space.+===== Task list =====
        
 <WRAP tabs> <WRAP tabs>
    * [[ecoevo_molevoll:topics:genome_assembly|Task list]]    * [[ecoevo_molevoll:topics:genome_assembly|Task list]]
 </WRAP> </WRAP>