Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;38(12):e131.
doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Affiliations

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D Hansen et al. Nucleic Acids Res. 2010 Jul.

Abstract

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Nucleotide frequencies versus position for stringently mapped reads. For each experiment, mapped reads were extended upstream of the 5′-start position, such that the first position of the actual read is 1 and positions 0 to −20 are obtained from the genome. The first hexamer of the read is shaded. Brief experimental protocols are indicated in the key. (a) RNA-Seq experiments conducted using priming with random hexamers, with and without RNA fragmentation. (b) DNA resequencing and ChIP-Seq experiments. (c) RNA-Seq experiments with alternative library preparation protocols, including priming with random hexamers followed by fragmentation using DNase I and priming with oligo(dT) followed by fragmentation using either DNase I, nebulization or sonication.
Figure 2.
Figure 2.
Hexamer frequencies. (a) The logarithm (base 2) of all (4096) observed hexamer frequencies computed using positions 1–6 of the aligned reads for an experiment in H. sapiens (8) versus an experiment in S. cerevisiae (9). The two distributions have a correlation of formula image. (b) As in (a), but the hexamers correspond to positions 25–30 of the aligned reads, with a correlation of formula image.
Figure 3.
Figure 3.
Nucleotide frequencies versus position for stringently mapped stranded reads for the A nucleotide. (a and b) As in Figure 1a, but split according to whether reads map to the sense or antisense strand. (c) Difference between the frequencies in (a and b).
Figure 4.
Figure 4.
Evaluation of the reweighting scheme. (a and b) Unadjusted and re-weighted base-level counts for reads from the WT experiment mapped to the sense strand of a 1-kb coding region in S. cerevisiae (YOL086C). The graey bars near the x-axis indicate unmappable genomic locations. (c) The formula image goodness-of-fit statistics based on unadjusted and reweighted counts for 552 highly expressed regions of constant expression. (d) Smoothed histograms of the reduction in formula image goodness-of-fit statistics when using the re-weighting scheme, evaluated in five different experiments. Values greater than zero indicate that the re-weighting scheme improves the uniformity of the read distribution.

Similar articles

Cited by

References

    1. Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
    1. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. - PMC - PubMed
    1. Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. - PubMed
    1. Gentleman R, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed

Publication types