====== Sequencing entire genomes ======
Despite substantial improvements, sequencing reads are still substantially smaller than the typical sizes of genomes. Genome sizes, range from a couple of million bp for bacterial genomes up to several billion base pairs for some eukaryotes (Fig. {{ref>GenomeSizes}}).
{{ :ecoevo_molevol:wiki:mee:figures:genome_sizes.png?400 }}
Distribution of genome sizes across the tree of life (Figure taken from [[https://en.wikipedia.org/wiki/Genome_size|here]]).
The challenge to sequence genomes of such sizes is typically addressed with a [[https://www.nature.com/scitable/topicpage/complex-genomes-shotgun-sequencing-609/|whole genome shotgun sequencing strategy]]. This approach was introduced in 1995 when a research team set out to sequence the genome of //Haemophilus influenzae RD// by random sequencing ([[https://pubmed.ncbi.nlm.nih.gov/7542800/|Fleischmann, et al. 1995]]), and replaced the traditional hierarchical shotgun sequencing approach, and the prier. The same lab used this technology to sequence five years later the genome of a eukaryote, //Drosophila melanogaster//, which is about 100 times larger than the bacterial genome ([[https://pubmed.ncbi.nlm.nih.gov/10731132/|Adams, et al. 2000]]). Nowadays, virtually all genome sequencing efforts are whole genome shotgun approaches.
In a nutshell, the genomic DNA is initially shredded at random positions into overlapping fragments, which are later often referred to as inserts. Short DNA segments with known sequence, so called //adapters//, are then added to these //inserts// to provide the starting points necessary for the sequencing. Among others, this can be primer binding sites for both fragment amplification via PCR, and for the sequencing itself (e.g. Illumina adapter or the bell-shaped adapters used by PacBio), or adapters providing the motor that is necessary to pull the DNA molecule through a Nanopore (cf. Figure {{ref>longread}}). The collection of resulting fragments is referred to as a //shotgun library//. Depending on whether only one end or both ends of these shotgun fragments are sequenced, they are referred to as single- or paired-end shotgun libraries. Once a shotgun library has been sequenced – typically up to a coverage between 60 and 100 – the genome is reconstructed from this data.
{{ :general:bioseqanalysis:images:shotgun-approach.png?400 |}}