Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(2):e30619.
doi: 10.1371/journal.pone.0030619. Epub 2012 Feb 1.

NGS QC Toolkit: a toolkit for quality control of next generation sequencing data

Affiliations

NGS QC Toolkit: a toolkit for quality control of next generation sequencing data

Ravi K Patel et al. PLoS One. 2012.

Abstract

Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Flow chart showing various tools included in NGS QC Toolkit.
The tools have been grouped into QC tools, trimming tools, format converters and statistics tools.
Figure 2
Figure 2. Workflow of the QC tools for Illumina (IlluQC) and Roche 454 (454QC) data.
IlluQC tools process FASTQ files containing paired-end (PE) and/or single-end (SE) reads. After filtering low-quality reads and reads containing primer/adaptor contamination as per given criteria, high-quality (HQ) reads and QC statistics are generated in the output folder. 454QC tools process SE and PE sequence and quality files in FASTA format. After trimming reads containing homopolymer (optional), low-quality reads are removed and reads containing primer/adaptor contamination are trimmed as per given criteria. Each of these steps is followed by filtering of reads of given length cut-off. Finally, HQ reads and QC statistics are generated in the output folder.
Figure 3
Figure 3. Snapshots showing graphs of various QC statistics generated as output by QC tools.
(A) Average quality score for each base position, (B) GC content distribution, (C) Average Phred quality score distribution, (D) Base composition and (E) read length distribution for both input (red) and HQ filtered (green) data. (F) Percentage of reads with different quality score ranges at each base position. (G,H) Pie charts show summary of QC analysis of Illumina (G) and Roche 454 (H) data.

Similar articles

Cited by

References

    1. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402. - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. - PMC - PubMed
    1. Garg R, Patel RK, Tyagi AK, Jain M. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 2011;18:53–63. - PMC - PubMed
    1. Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, et al. Gene discovery and tissue-specific transcriptome analysis in chickpea with massively parallel pyrosequencing and web resource development. Plant Physiol. 2011;156:1661–1678. - PMC - PubMed
    1. Martinez-Alcantara A, Ballesteros E, Rojas FM, Koshinsky H, Fofanov VY, et al. PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics. 2009;25:2438–2439. - PMC - PubMed

Publication types