Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Apr;38(6):1767-71.
doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Affiliations
Review

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Peter J A Cock et al. Nucleic Acids Res. 2010 Apr.

Abstract

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Visual representation of the mapping between PHRED and Solexa quality scores. Vertical layout represents the probability of error on a log scale, therefore the PHRED points are equally spaced (black circles on left), while the Solexa points are not (white circles on right). Solid black lines are reciprocal mappings between scores, and grey arrows are lossy mappings. Near the top of the figure, the black lines are almost horizontal because the two scores are almost equal. The straightforward mapping of higher scores is omitted due to space.

Similar articles

Cited by

References

    1. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA. 1988;85:2444–2448. - PMC - PubMed
    1. Bennett S. Solexa Ltd. Pharmacogenomics. 2004;5:433–438. - PubMed
    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen JY, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Pandey V, Nutter RC, E EP. Applied Biosystems SOLiD system: ligation-based sequencing. In: Janitz M, editor. Next Generation Genome Sequencing: Towards Personalized Medicine. Wiley; 2008. pp. 29–41.
    1. Cock P.JA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. - PMC - PubMed

MeSH terms