The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
- PMID: 20015970
- PMCID: PMC2847217
- DOI: 10.1093/nar/gkp1137
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
Abstract
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
Figures
Similar articles
-
ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research.BMC Bioinformatics. 2016 Feb 2;17:56. doi: 10.1186/s12859-016-0915-y. BMC Bioinformatics. 2016. PMID: 26830926 Free PMC article.
-
XS: a FASTQ read simulator.BMC Res Notes. 2014 Jan 16;7:40. doi: 10.1186/1756-0500-7-40. BMC Res Notes. 2014. PMID: 24433564 Free PMC article.
-
Sharing Programming Resources Between Bio* Projects.Methods Mol Biol. 2019;1910:747-766. doi: 10.1007/978-1-4939-9074-0_25. Methods Mol Biol. 2019. PMID: 31278684 Free PMC article.
-
Introduction to bioinformatics.Mol Nutr Food Res. 2006 Jul;50(7):610-9. doi: 10.1002/mnfr.200500273. Mol Nutr Food Res. 2006. PMID: 16810733 Review.
-
A brief history of bioinformatics.Brief Bioinform. 2019 Nov 27;20(6):1981-1996. doi: 10.1093/bib/bby063. Brief Bioinform. 2019. PMID: 30084940 Review.
Cited by
-
Comprehensive sequencing of the genome and transcriptome of the Xishuangbanna game fowl.Sci Data. 2024 Oct 22;11(1):1163. doi: 10.1038/s41597-024-04014-4. Sci Data. 2024. PMID: 39438465 Free PMC article.
-
RNA-Seq-Based Transcriptome Analysis of Chinese Cordyceps Aqueous Extracts Protective Effect against Adriamycin-Induced mpc5 Cell Injury.Int J Mol Sci. 2024 Sep 26;25(19):10352. doi: 10.3390/ijms251910352. Int J Mol Sci. 2024. PMID: 39408685 Free PMC article.
-
Metagenomic insights into traditional fermentation of rice-based beverages among ethnic tribes in southern Assam, Northeast India.Front Microbiol. 2024 Sep 11;15:1410098. doi: 10.3389/fmicb.2024.1410098. eCollection 2024. Front Microbiol. 2024. PMID: 39380672 Free PMC article.
-
Diverse Head-to-Tail Sequences in the Circular Genome of Human Bocavirus Genotype 1 among Children with Acute Respiratory Infections Implied the Switch of Template Chain in the Rolling-Circle Replication Model.Pathogens. 2024 Sep 3;13(9):757. doi: 10.3390/pathogens13090757. Pathogens. 2024. PMID: 39338948 Free PMC article.
-
An interconnected data infrastructure to support large-scale rare disease research.Gigascience. 2024 Jan 2;13:giae058. doi: 10.1093/gigascience/giae058. Gigascience. 2024. PMID: 39302238 Free PMC article.
References
-
- Bennett S. Solexa Ltd. Pharmacogenomics. 2004;5:433–438. - PubMed
-
- Pandey V, Nutter RC, E EP. Applied Biosystems SOLiD system: ligation-based sequencing. In: Janitz M, editor. Next Generation Genome Sequencing: Towards Personalized Medicine. Wiley; 2008. pp. 29–41.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources