Abstract
Modern tomatoes have narrow genetic diversity limiting their improvement potential. We present a tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions, revealing 4,873 genes absent from the reference genome. Presence/absence variation analyses reveal substantial gene loss and intense negative selection of genes and promoters during tomato domestication and improvement. Lost or negatively selected genes are enriched for important traits, especially disease resistance. We identify a rare allele in the TomLoxC promoter selected against during domestication. Quantitative trait locus mapping and analysis of transgenic plants reveal a role for TomLoxC in apocarotenoid production, which contributes to desirable tomato flavor. In orange-stage fruit, accessions harboring both the rare and common TomLoxC alleles (heterozygotes) have higher TomLoxC expression than those homozygous for either and are resurgent in modern tomatoes. The tomato pan-genome adds depth and completeness to the reference genome, and is useful for future biological discovery and breeding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Raw genome and RNA-Seq reads have been deposited into the National Center for Biotechnology Information Sequence Read Archive under accession codes SRP150040, SRP186721 and SRP172989, respectively. The nonreference genome sequences and annotated genes of the tomato pan-genome and SNPs called from the RIL population are available via the Dryad Digital Repository (https://doi.org/10.5061/dryad.m463f7k).
Change history
23 May 2019
In the version of the article originally published, the URL https://doi.org/10.5061/dryad.m463f7k in the ‘Data availability’ section was hyperlinked incorrectly. In addition, the copyright holder was listed as ‘The Author(s)’, but the copyright line should have read ‘This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply, 2019’. The errors have been corrected in the HTML and PDF versions of the article.
References
The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).
Bauchet, G. & Causse, M. in Genetic Diversity in Plants (Intech, 2012).
Tanksley, S. D. The genetic, developmental, and molecular bases of fruit size and shape variation in tomato. Plant Cell 16 (Suppl.), S181–S189 (2004).
Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261 (2018).
Labate, J. A. & Robertson, L. D. Evidence of cryptic introgression in tomato (Solanum lycopersicum L.) based on wild tomato species alleles. BMC Plant Biol. 12, 133 (2012).
Kim, J. et al. Analysis of natural and induced variation in tomato glandular trichome flavonoids identifies a gene not present in the reference genome. Plant Cell 26, 3272–3285 (2014).
Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).
Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).
Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).
Blanca, J. et al. Genomic variation in tomato, from wild ancestors to contemporary breeding accessions. BMC Genom. 16, 257 (2015).
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Causse, M. et al. Whole genome resequencing in tomato reveals variation associated with introgression and breeding events. BMC Genom. 14, 791 (2013).
Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).
Strickler, S. R. et al. Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives. PeerJ 3, e793 (2015).
Itkin, M. et al. Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes. Science 341, 175–179 (2013).
Graham, J. S. et al. Wound-induced proteinase inhibitors from tomato leaves. II. The cDNA-deduced primary structure of pre-inhibitor II. J. Biol. Chem. 260, 6561–6564 (1985).
de Kock, M. J. D., Brandwagt, B. F., Bonnema, G., de Wit, P. J. G. M. & Lindhout, P. The tomato Orion locus comprises a unique class of Hcr9 genes. Mol. Breed. 15, 409–422 (2005).
Ori, N. et al. The I2C family from the wilt disease resistance locus I2 belongs to the nucleotide binding, leucine-rich repeat superfamily of plant resistance genes. Plant Cell 9, 521–532 (1997).
Martin, G. B. et al. Map-based cloning of a protein kinase gene conferring disease resistance in tomato. Science 262, 1432–1436 (1993).
Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).
Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).
Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).
Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 184 (2017).
Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).
Hurgobin, B. et al. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 16, 1265–1274 (2018).
Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).
Menda, N. et al. Analysis of wild-species introgressions in tomato inbreds uncovers ancestral origins. BMC Plant Biol. 14, 287 (2014).
Shinozaki, Y. et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat. Commun. 9, 364 (2018).
Saladié, M. et al. A reevaluation of the key factors that influence tomato fruit softening and integrity. Plant Physiol. 144, 1012–1028 (2007).
Mu, Q. et al. Fruit weight is controlled by Cell Size Regulator encoding a novel protein that is expressed in maturing tomato fruits. PLoS Genet. 13, e1006930 (2017).
Tiwari, P., Sangwan, R. S. & Sangwan, N. S. Plant secondary metabolism linked glycosyltransferases: An update on expanding knowledge and scopes. Biotechnol. Adv. 34, 714–739 (2016).
Buttery, R. G., Teranishi, R., Flath, R. A. & Ling, L. C. in Flavor Chemistry: Trends and Developments, Vol. 388 (eds Teranishi, R., Buttery, R. G. & Shahidi, F.) 213–222 (American Chemical Society, 1989).
Buttery, R. G., Seifert, R. M., Guadagni, D. G. & Ling, L. C. Characterization of additional volatile components of tomato. J. Agr. Food Chem. 19, 524–529 (1971).
Tieman, D. et al. The chemical interactions underlying tomato flavor preferences. Curr. Biol. 22, 1035–1039 (2012).
Shen, J. et al. A 13-lipoxygenase, TomloxC, is essential for synthesis of C5 flavour volatiles in tomato. J. Exp. Bot. 65, 419–428 (2014).
Chen, G. et al. Identification of a specific isoform of tomato lipoxygenase (TomloxC) involved in the generation of fatty acid-derived flavor compounds. Plant Physiol. 136, 2641–2651 (2004).
Ashrafi, H., Kinkade, M. & Foolad, M. R. A new genetic linkage map of tomato based on a Solanum lycopersicum × S. pimpinellifolium RIL population displaying locations of candidate pathogen response genes. Genome 52, 935–956 (2009).
Hayward, S., Cilliers, T. & Swart, P. Lipoxygenases: From isolation to application. Compr. Rev. Food Sci. Food Saf. 16, 199–211 (2017).
Klee, H. J. & Tieman, D. M. The genetics of fruit flavour preferences. Nat. Rev. Genet. 19, 347–356 (2018).
Baldwin, E. A., Scott, J. W., Shewmaker, C. K. & Schuch, W. Flavor trivia and tomato aroma: Biochemistry and possible mechanisms for control of important aroma components. HortScience 35, 1013–1022 (2000).
Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, D. et al. MEGAHITv1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Daniell, H. et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor. Appl. Genet. 112, 1503–1518 (2006).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421 (2009).
Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).
Han, Y. & Wessler, S. R. MITE-Hunter: A program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucl. Acids Res. 38, e199 (2010).
Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).
Stanke, M. & Morgenstern, B. AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucl. Acids Res. 33, W465–W467 (2005).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucl. Acids Res. 40, e161 (2012).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Gotz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acids Res. 36, 3420–3435 (2008).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Golicz, A. A. et al. Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct. Integr. Genom. 15, 189–196 (2015).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Hubisz, M. J., Falush, D., Stephens, M. & Pritchard, J. K. Inferring weak population structure with the assistance of sample group information. Mol. Ecol. Resour. 9, 1322–1332 (2009).
Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour. 4, 359–361 (2012).
Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
Zhong, S. et al. High-throughput Illumina strand-specific RNA sequencing library preparation. Cold Spring Harb. Protoc. 2011, 940–949 (2011).
Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Tikunov, Y. et al. A novel approach for nontargeted data analysis for metabolomics: Large-scale profiling of tomato fruit volatiles. Plant Physiol. 139, 1125–1137 (2005).
Alba, R. et al. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. Plant Cell 17, 2954–2965 (2005).
Gonda, I. et al. Sequencing-based bin map construction of a tomato mapping population, facilitating high-resolution quantitative trait loci detection. Plant Genome 12, 180010 (2019).
Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).
Spindel, J. et al. Bridging the genotyping gap: Using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theor. Appl. Genet. 126, 2699–2716 (2013).
Glauser, G. et al. Velocity estimates for signal propagation leading to systemic jasmonic acid accumulation in wounded Arabidopsis. J. Biol. Chem. 284, 34506–34513 (2009).
Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucl. Acids Res. 29, e45 (2001).
Acknowledgements
This research was supported by grants from the US National Science Foundation (IOS-1339287 to Z.F. and J.J.G.; IOS-1539831 to Z.F., J.J.G. and H.J.K.; and IOS-1564366 to E.v.d.K., J.C. and D.M.T.), BARD, the US–Israel Binational Agricultural Research and Development Fund, a Vaadia-BARD Postdoctoral Fellowship Award (FI-508-14 to I.G.) and the USDA Agricultural Research Service.
Author information
Authors and Affiliations
Contributions
Z.F., J.J.G., H.J.K., S.H. and E.v.d.K. designed and managed the project. I.G., E.A.B.-C., K.A.S., T.L.F., G.L.S., T.W.T., D.M.T., Y.X., M.J.D., J.B., J.C., M.R.F. and E.v.d.K. collected samples and performed experiments. L.G., I.G., H.S., Q.M. and K.B. performed data analyses. L.G. and I.G. wrote the manuscript. Z.F. and J.J.G. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–14 and Supplementary Note
Supplementary Tables
Supplementary Tables 1–20
Rights and permissions
About this article
Cite this article
Gao, L., Gonda, I., Sun, H. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet 51, 1044–1051 (2019). https://doi.org/10.1038/s41588-019-0410-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-019-0410-2
This article is cited by
-
Graph pangenome reveals the regulation of malate content in blood-fleshed peach by NAC transcription factors
Genome Biology (2025)
-
Pangenome graphs and their applications in biodiversity genomics
Nature Genetics (2025)
-
Repairing a deleterious domestication variant in a floral regulator gene of tomato by base editing
Nature Genetics (2025)
-
Pangenome and multi-tissue gene atlas provide new insights into the domestication and highland adaptation of yaks
Journal of Animal Science and Biotechnology (2024)
-
Dissection of major QTLs and candidate genes for seedling stage salt/drought tolerance in tomato
BMC Genomics (2024)