meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

general:bioseqanalysis:readsimulation [2021/10/19 20:07] – created ingogeneral:bioseqanalysis:readsimulation [2021/10/19 20:16] (current) ingo
Line 1: Line 1:
 ====== Simulation of WGS ====== ====== Simulation of WGS ======
  
-The simulation of a read library based on an existing sequence is often used for quality control and benchmarking of next generation sequencing methods. This is possible because the simulation provides a read library under controlled conditions. Thus, libraries can be simulated under perfect, natural but also under the most adverse conditions. Simulating read sets as they are obtained from large scale sequencing projects is meanwhile common, and many different tools have been developed for this purpose. An overview is given in {{:ecoevo_molevol:wiki:mee:literature:escalona2016.natrevgenet.pdf|Escalona et al. (2016)}}. Figure {{ref>Simulation}} from this publication gives an overview of the available tools. <figure Simulation> {{  :ecoevo_molevol:wiki:mee:figures:readset-simulation.png?600  }} <caption><fs 0.8em>Decision tree from [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/|Escalona et al. (2016)]] of how to use which simulator for high-throughput sequencing data sets.</fs></caption></figure>+The simulation of a read library based on an existing sequence is often used for quality control and benchmarking of next generation sequencing methods. This is possible because the simulation provides a read library under controlled conditions. Thus, libraries can be simulated under perfect, natural but also under the most adverse conditions. Simulating read sets as they are obtained from large scale sequencing projects is meanwhile common, and many different tools have been developed for this purpose. An overview is given in {{:ecoevo_molevol:wiki:mee:literature:escalona2016.natrevgenet.pdf|Escalona et al. (2016)}}. Figure {{ref>Simulation}} from this publication gives an overview of the available tools.  
 +Simulated data have the advantage that we can basically control each step in the data generation, and thus we are in possession of a gold standard for each step in a biosequence analysis. This will help to very precisely assess the performance of each algorithm used during data analysis ([[https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12463|Greshake, et al. 2016]]). 
 +<figure Simulation> {{  :ecoevo_molevol:wiki:mee:figures:readset-simulation.png?600  }} <caption><fs 0.8em>Decision tree from [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/|Escalona et al. (2016)]] of how to use which simulator for high-throughput sequencing data sets.</fs></caption></figure>
  
 [[https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm|ART]] ([[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3278762/|Huang, et al. 2012]]) is a simulation tool to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking the real sequencing process with empirical error models or quality profiles summarized from large re-calibrated sequencing data. [[https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm|ART]] ([[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3278762/|Huang, et al. 2012]]) is a simulation tool to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking the real sequencing process with empirical error models or quality profiles summarized from large re-calibrated sequencing data.