Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor

This article has been updated

Abstract

Modern tomatoes have narrow genetic diversity limiting their improvement potential. We present a tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions, revealing 4,873 genes absent from the reference genome. Presence/absence variation analyses reveal substantial gene loss and intense negative selection of genes and promoters during tomato domestication and improvement. Lost or negatively selected genes are enriched for important traits, especially disease resistance. We identify a rare allele in the TomLoxC promoter selected against during domestication. Quantitative trait locus mapping and analysis of transgenic plants reveal a role for TomLoxC in apocarotenoid production, which contributes to desirable tomato flavor. In orange-stage fruit, accessions harboring both the rare and common TomLoxC alleles (heterozygotes) have higher TomLoxC expression than those homozygous for either and are resurgent in modern tomatoes. The tomato pan-genome adds depth and completeness to the reference genome, and is useful for future biological discovery and breeding.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Pan-genome of tomato.
Fig. 2: PAVs of genes in wild and cultivated tomatoes.
Fig. 3: Gene selection preference during tomato domestication and improvement.
Fig. 4: Variation of TomLoxC expression under different promoter alleles.
Fig. 5: Involvement of TomLoxC in apocarotenoid biosynthesis.

Similar content being viewed by others

Data availability

Raw genome and RNA-Seq reads have been deposited into the National Center for Biotechnology Information Sequence Read Archive under accession codes SRP150040, SRP186721 and SRP172989, respectively. The nonreference genome sequences and annotated genes of the tomato pan-genome and SNPs called from the RIL population are available via the Dryad Digital Repository (https://doi.org/10.5061/dryad.m463f7k).

Change history

  • 23 May 2019

    In the version of the article originally published, the URL https://doi.org/10.5061/dryad.m463f7k in the ‘Data availability’ section was hyperlinked incorrectly. In addition, the copyright holder was listed as ‘The Author(s)’, but the copyright line should have read ‘This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply, 2019’. The errors have been corrected in the HTML and PDF versions of the article.

References

  1. The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

    Article  Google Scholar 

  2. Bauchet, G. & Causse, M. in Genetic Diversity in Plants (Intech, 2012).

  3. Tanksley, S. D. The genetic, developmental, and molecular bases of fruit size and shape variation in tomato. Plant Cell 16 (Suppl.), S181–S189 (2004).

    Article  CAS  Google Scholar 

  4. Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261 (2018).

    Article  CAS  Google Scholar 

  5. Labate, J. A. & Robertson, L. D. Evidence of cryptic introgression in tomato (Solanum lycopersicum L.) based on wild tomato species alleles. BMC Plant Biol. 12, 133 (2012).

    Article  CAS  Google Scholar 

  6. Kim, J. et al. Analysis of natural and induced variation in tomato glandular trichome flavonoids identifies a gene not present in the reference genome. Plant Cell 26, 3272–3285 (2014).

    Article  CAS  Google Scholar 

  7. Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).

    Article  Google Scholar 

  8. Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

    Article  CAS  Google Scholar 

  9. Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).

    Article  CAS  Google Scholar 

  10. Blanca, J. et al. Genomic variation in tomato, from wild ancestors to contemporary breeding accessions. BMC Genom. 16, 257 (2015).

    Article  Google Scholar 

  11. Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    Article  CAS  Google Scholar 

  12. Causse, M. et al. Whole genome resequencing in tomato reveals variation associated with introgression and breeding events. BMC Genom. 14, 791 (2013).

    Article  Google Scholar 

  13. Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

    Article  CAS  Google Scholar 

  14. Strickler, S. R. et al. Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives. PeerJ 3, e793 (2015).

    Article  Google Scholar 

  15. Itkin, M. et al. Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes. Science 341, 175–179 (2013).

    Article  CAS  Google Scholar 

  16. Graham, J. S. et al. Wound-induced proteinase inhibitors from tomato leaves. II. The cDNA-deduced primary structure of pre-inhibitor II. J. Biol. Chem. 260, 6561–6564 (1985).

    CAS  PubMed  Google Scholar 

  17. de Kock, M. J. D., Brandwagt, B. F., Bonnema, G., de Wit, P. J. G. M. & Lindhout, P. The tomato Orion locus comprises a unique class of Hcr9 genes. Mol. Breed. 15, 409–422 (2005).

    Article  Google Scholar 

  18. Ori, N. et al. The I2C family from the wilt disease resistance locus I2 belongs to the nucleotide binding, leucine-rich repeat superfamily of plant resistance genes. Plant Cell 9, 521–532 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Martin, G. B. et al. Map-based cloning of a protein kinase gene conferring disease resistance in tomato. Science 262, 1432–1436 (1993).

    Article  CAS  Google Scholar 

  20. Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

    Article  CAS  Google Scholar 

  21. Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

    Article  CAS  Google Scholar 

  22. Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

    Article  CAS  Google Scholar 

  23. Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 184 (2017).

    Article  Google Scholar 

  24. Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

    Article  Google Scholar 

  25. Hurgobin, B. et al. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 16, 1265–1274 (2018).

    Article  CAS  Google Scholar 

  26. Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

    Article  CAS  Google Scholar 

  27. Menda, N. et al. Analysis of wild-species introgressions in tomato inbreds uncovers ancestral origins. BMC Plant Biol. 14, 287 (2014).

    Article  Google Scholar 

  28. Shinozaki, Y. et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat. Commun. 9, 364 (2018).

    Article  Google Scholar 

  29. Saladié, M. et al. A reevaluation of the key factors that influence tomato fruit softening and integrity. Plant Physiol. 144, 1012–1028 (2007).

    Article  Google Scholar 

  30. Mu, Q. et al. Fruit weight is controlled by Cell Size Regulator encoding a novel protein that is expressed in maturing tomato fruits. PLoS Genet. 13, e1006930 (2017).

    Article  Google Scholar 

  31. Tiwari, P., Sangwan, R. S. & Sangwan, N. S. Plant secondary metabolism linked glycosyltransferases: An update on expanding knowledge and scopes. Biotechnol. Adv. 34, 714–739 (2016).

    Article  CAS  Google Scholar 

  32. Buttery, R. G., Teranishi, R., Flath, R. A. & Ling, L. C. in Flavor Chemistry: Trends and Developments, Vol. 388 (eds Teranishi, R., Buttery, R. G. & Shahidi, F.) 213–222 (American Chemical Society, 1989).

  33. Buttery, R. G., Seifert, R. M., Guadagni, D. G. & Ling, L. C. Characterization of additional volatile components of tomato. J. Agr. Food Chem. 19, 524–529 (1971).

    Article  CAS  Google Scholar 

  34. Tieman, D. et al. The chemical interactions underlying tomato flavor preferences. Curr. Biol. 22, 1035–1039 (2012).

    Article  CAS  Google Scholar 

  35. Shen, J. et al. A 13-lipoxygenase, TomloxC, is essential for synthesis of C5 flavour volatiles in tomato. J. Exp. Bot. 65, 419–428 (2014).

    Article  CAS  Google Scholar 

  36. Chen, G. et al. Identification of a specific isoform of tomato lipoxygenase (TomloxC) involved in the generation of fatty acid-derived flavor compounds. Plant Physiol. 136, 2641–2651 (2004).

    Article  CAS  Google Scholar 

  37. Ashrafi, H., Kinkade, M. & Foolad, M. R. A new genetic linkage map of tomato based on a Solanum lycopersicum × S. pimpinellifolium RIL population displaying locations of candidate pathogen response genes. Genome 52, 935–956 (2009).

    Article  CAS  Google Scholar 

  38. Hayward, S., Cilliers, T. & Swart, P. Lipoxygenases: From isolation to application. Compr. Rev. Food Sci. Food Saf. 16, 199–211 (2017).

    Article  CAS  Google Scholar 

  39. Klee, H. J. & Tieman, D. M. The genetics of fruit flavour preferences. Nat. Rev. Genet. 19, 347–356 (2018).

    Article  CAS  Google Scholar 

  40. Baldwin, E. A., Scott, J. W., Shewmaker, C. K. & Schuch, W. Flavor trivia and tomato aroma: Biochemistry and possible mechanisms for control of important aroma components. HortScience 35, 1013–1022 (2000).

    Article  CAS  Google Scholar 

  41. Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

    Article  CAS  Google Scholar 

  42. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  Google Scholar 

  43. Li, D. et al. MEGAHITv1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).

    Article  CAS  Google Scholar 

  44. Daniell, H. et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor. Appl. Genet. 112, 1503–1518 (2006).

    Article  CAS  Google Scholar 

  45. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    Article  Google Scholar 

  46. Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421 (2009).

    Article  Google Scholar 

  47. Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  CAS  Google Scholar 

  48. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).

    Article  Google Scholar 

  49. Han, Y. & Wessler, S. R. MITE-Hunter: A program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucl. Acids Res. 38, e199 (2010).

    Article  Google Scholar 

  50. Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).

    Article  Google Scholar 

  51. Stanke, M. & Morgenstern, B. AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucl. Acids Res. 33, W465–W467 (2005).

    Article  CAS  Google Scholar 

  52. Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

    Article  Google Scholar 

  53. Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

    Article  CAS  Google Scholar 

  54. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  Google Scholar 

  55. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  Google Scholar 

  56. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

    Article  CAS  Google Scholar 

  57. Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucl. Acids Res. 40, e161 (2012).

    Article  CAS  Google Scholar 

  58. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

    Article  CAS  Google Scholar 

  59. Gotz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acids Res. 36, 3420–3435 (2008).

    Article  CAS  Google Scholar 

  60. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  61. Golicz, A. A. et al. Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct. Integr. Genom. 15, 189–196 (2015).

    Article  CAS  Google Scholar 

  62. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  Google Scholar 

  63. Hubisz, M. J., Falush, D., Stephens, M. & Pritchard, J. K. Inferring weak population structure with the assistance of sample group information. Mol. Ecol. Resour. 9, 1322–1332 (2009).

    Article  Google Scholar 

  64. Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour. 4, 359–361 (2012).

    Article  Google Scholar 

  65. Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

    Article  CAS  Google Scholar 

  66. Zhong, S. et al. High-throughput Illumina strand-specific RNA sequencing library preparation. Cold Spring Harb. Protoc. 2011, 940–949 (2011).

    Article  Google Scholar 

  67. Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  68. McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  69. Tikunov, Y. et al. A novel approach for nontargeted data analysis for metabolomics: Large-scale profiling of tomato fruit volatiles. Plant Physiol. 139, 1125–1137 (2005).

    Article  CAS  Google Scholar 

  70. Alba, R. et al. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. Plant Cell 17, 2954–2965 (2005).

    Article  CAS  Google Scholar 

  71. Gonda, I. et al. Sequencing-based bin map construction of a tomato mapping population, facilitating high-resolution quantitative trait loci detection. Plant Genome 12, 180010 (2019).

    Article  Google Scholar 

  72. Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).

    Article  CAS  Google Scholar 

  73. Spindel, J. et al. Bridging the genotyping gap: Using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theor. Appl. Genet. 126, 2699–2716 (2013).

    Article  CAS  Google Scholar 

  74. Glauser, G. et al. Velocity estimates for signal propagation leading to systemic jasmonic acid accumulation in wounded Arabidopsis. J. Biol. Chem. 284, 34506–34513 (2009).

    Article  CAS  Google Scholar 

  75. Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucl. Acids Res. 29, e45 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by grants from the US National Science Foundation (IOS-1339287 to Z.F. and J.J.G.; IOS-1539831 to Z.F., J.J.G. and H.J.K.; and IOS-1564366 to E.v.d.K., J.C. and D.M.T.), BARD, the US–Israel Binational Agricultural Research and Development Fund, a Vaadia-BARD Postdoctoral Fellowship Award (FI-508-14 to I.G.) and the USDA Agricultural Research Service.

Author information

Authors and Affiliations

Authors

Contributions

Z.F., J.J.G., H.J.K., S.H. and E.v.d.K. designed and managed the project. I.G., E.A.B.-C., K.A.S., T.L.F., G.L.S., T.W.T., D.M.T., Y.X., M.J.D., J.B., J.C., M.R.F. and E.v.d.K. collected samples and performed experiments. L.G., I.G., H.S., Q.M. and K.B. performed data analyses. L.G. and I.G. wrote the manuscript. Z.F. and J.J.G. revised the manuscript.

Corresponding authors

Correspondence to James J. Giovannoni or Zhangjun Fei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–14 and Supplementary Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–20

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, L., Gonda, I., Sun, H. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet 51, 1044–1051 (2019). https://doi.org/10.1038/s41588-019-0410-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-019-0410-2

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research