Abstract
Environmental RNA viruses are ubiquitous and diverse, and probably have important ecological and biogeochemical impacts. Understanding the global diversity of RNA viruses is limited by sampling biases, dependence on cell culture and PCR for virus discovery, and a focus on viruses pathogenic to humans or economically important animals and plants. To address this knowledge gap, we generated metatranscriptomic sequence data from 32 diverse environments in 16 provinces and regions of China. We identified 6,624 putatively novel virus operational taxonomic units from soil, sediment and faecal samples, greatly expanding known diversity of the RNA virosphere. These newly identified viruses included positive-sense, negative-sense and double-strand RNA viruses from at least 62 families. Sediments and animal faeces were rich sources of viruses. Virome compositions were affected by local environmental factors, including organic content and eukaryote species abundance. Notably, environmental factors had a greater impact on the abundance and diversity of plant, fungal and bacterial viruses than of animal viromes. Our data confirm that RNA viruses are an integral part of both terrestrial and aquatic ecosystems.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The sequence reads generated in this study are available at the NCBI Sequence Read Archive (SRA) database under BioProject accession PRJNA716119. All viral sequences generated in this study have been deposited in GenBank under accession numbers (https://www.ncbi.nlm.nih.gov/nuccore?term=716119%5BBioProject%5D) MW784004-MW784109, MW896840-MW897324, MZ218144-MZ218759, MZ556337-MZ556592, MZ678955-MZ680357, ON049747-ON050964, ON161767-ON164489. All other data are available in the paper or in the supplementary materials. The CheckV database used for viral genome quality and completeness estimation can be accessed via https://bitbucket.org/berkeleylab/checkv. The Conserved Domain Database (CDD) used for ORF annotation can be accessed via https://www.ncbi.nlm.nih.gov/cdd/. The UniRef30_2021_03 database used in HHblits analysis can be accessed via http://wwwuser.gwdg.de/~compbiol/uniclust/2021_03/. The SILVA database used for rRNA removal can be accessed via https://www.arb-silva.de/. Source data are provided with this paper.
References
Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).
Zhang, Y.-Z., Shi, M. & Holmes, E. C. Using metagenomics to characterize an expanding virosphere. Cell 172, 1168–1172 (2018).
Li, C.-X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015).
Starr, E. P., Nuccio, E. E., Pett-Ridge, J., Banfield, J. F. & Firestone, M. K. Metatranscriptomic reconstruction reveals RNA viruses with the potential to shape carbon cycling in soil. Proc. Natl Acad. Sci. USA 116, 25900–25908 (2019).
Wolf, Y. I. et al. Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome. Nat. Microbiol. 5, 1262–1270 (2020).
Zayed, A. A. et al. Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome. Science 376, 156–162 (2022).
Simmonds, P. et al. Virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).
Trubl, G., Hyman, P., Roux, S. & Abedon, S. T. Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Syst. 4, 23 (2020).
Jin, M. et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome 7, 58 (2019).
Trubl, G. et al. Soil viruses are underexplored players in ecosystem carbon processing. mSystems 3, e00076-18 (2018).
Steward, G. F. et al. Are we missing half of the viruses in the ocean? ISME J. 7, 672–679 (2013).
Paul, K. I., Scott Black, A. & Conyers, M. K. in Advances in Agronomy. Sparks, D.L., Vol. 78 187–214 (Elsevier, 2003).
Urayama, S., Takaki, Y. & Nunoura, T. FLDS: a comprehensive dsRNA sequencing method for intracellular RNA virus surveillance. Microbes Environ. 31, 33–40 (2016).
Armbrust, E. V. The life of diatoms in the world’s oceans. Nature 459, 185–192 (2009).
Wu, W., Jin, Y., Bai, F. & Jin, S. in Molecular Medical Microbiology. Tang, Y.W., Liu, D., Schwartzman, J., Sussman, M., Poxton, I., 753–767 (Elsevier, 2015).
Cooney, S., O’Brien, S., Iversen, C. & Fanning, S. in Encyclopedia of Food Safety. Motarjemi, Y., 433–441 (Elsevier, 2014).
Geoghegan, J. L. et al. Hidden diversity and evolution of viruses in market fish. Virus Evol. 4, vey031 (2018).
Lauber, C. et al. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non-enveloped fish viruses. Cell Host Microbe 22, 387–399.e6 (2017).
Shi, M., Zhang, Y.-Z. & Holmes, E. C. Meta-transcriptomics and the evolutionary biology of RNA viruses. Virus Res. 243, 83–90 (2018).
Turnbull, O. M. H. et al. Meta-transcriptomic identification of divergent Amnoonviridae in Fish. Viruses 12, 1254 (2020).
Bauermann, F. V., Hause, B., Buysse, A. R., Joshi, L. R. & Diel, D. G. Identification and genetic characterization of a porcine hepe-astrovirus (bastrovirus) in the United States. Arch. Virol. 164, 2321–2326 (2019).
Oude Munnink, B. B. et al. A novel astrovirus-like RNA virus detected in human stool. Virus Evol. 2, vew005 (2016).
Williamson, K. E. et al. Estimates of viral abundance in soils are strongly influenced by extraction and enumeration methods. Biol. Fertil. Soils 49, 857–869 (2013).
Wang, C., Liu, D. & Bai, E. Decreasing soil microbial diversity is associated with decreasing microbial biomass under nitrogen addition. Soil Biol. Biochem. 120, 126–133 (2018).
Wang, Q. et al. Effects of nitrogen and phosphorus inputs on soil bacterial abundance, diversity, and community composition in Chinese fir plantations. Front. Microbiol. 9, 1543 (2018).
Payne, S. in Viruses. Payne, S., 219–226 (Elsevier, 2017).
Hillman, B. I. & Cai, G. The family Narnaviridae. Adv. Virus Res. 86, 149–176 (2013).
Wolf, Y. I. et al. Origins and evolution of the global RNA virome. mBio 9, e02329-18 (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analysis in R. Bioinformatics 35, 526–528 (2019).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423 (2019).
Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Gupta, R., Jung, E. & Brunak, S. NetNGlyc 1.0 Server (2017). DTU Health Tech. http://www.cbs.dtu.dk/services/NetNGlyc/
Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Lagkouvardos, I., Fischer, S., Kumar, N. & Clavel, T. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. PeerJ 5, e2836 (2017).
McLeod, A., Xu, C. & Lai, Y. Package ‘bestglm’. CRAN. (2020).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Acknowledgements
We thank D.-X. Wang, K. Li, W.-B. Zhao, X.-N. Diao, A.-J. Gong, Y.-L. Zhang, J.-B. Wang, H. Luo, D.-A. Zhang, Y.-Q. Zhao and M.-Li for their contributions to sample collection, and X.-Q. Luo, R.-X. Hu, M.-Z. Liu, J. Liu, Y. Jiang, J.-J. Guo, J.-J. Wang and P. Lu for assisting with PCR confirmations. This study was supported by the National Natural Science Foundation of China (grant nos. 32130002, 31930001, 32041004, 81861138003 and 81672057 to Y.-Z.Z) and the National Key R&D Program of China (2016YFC1201900 to Y.-Z.Z). E.C.H was supported by an ARC Australian Laureate Fellowship (FL170100022).
Author information
Authors and Affiliations
Contributions
Y.-Z.Z. conceived and designed the study. Y.-M.C., J.-H.T., X.C., X.-D.L., J.-J.S., H.C., Z.-Y.H., W.-D.Y., Z.-C.Z., J.W., F.L., H.-W.W. and Q.-Y.X. performed sample collection and geographic information recording. S.S., Y.-M.C., M.W. and E.C.H. analysed the data. Y.-M.C., W.-H.G. and W.W. performed the experiments. S.S., Y.-M.C., E.C.H. and Y.-Z.Z. wrote the paper with input from all authors. Y.-Z.Z. led the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
Environments and Chinese provinces sampled in this study.
Extended Data Fig. 2 Viral composition of each library.
Relative proportions were determined by the number of reads corresponding to contigs with viral hits to each viral clade as a proportion of total viral reads in each of 442 biologically independent samples (analysis performed using DIAMOND BLASTX).
Extended Data Fig. 3 Ecological factors significantly associated with viral abundance in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6a). Boxplots represent abundance values plotted against (B) environment, (C) location, (D) total phosphorus, (E) total potassium, (F) available phosphorus, (G) available potassium, (H) organic content, (I) eukaryote species abundance, and (J) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 4 Ecological factors significantly associated with viral abundance in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7a), along with the abundance values plotted against (B) location, (C) organic content, and (D) eukaryote species abundance. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 5 Ecological factors significantly associated with Shannon diversity in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6c). Boxplots represent Shannon diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, and (I) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 6 Ecological factors significantly associated with true diversity in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6d), along with the true diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, (I) eukaryote species richness, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 7 Ecological factors significantly associated with Shannon diversity in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7c), along with the Shannon diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 8 Ecological factors significantly associated with true diversity in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7d), along with the true diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 9 Ecological factors significantly associated with viral richness in environmental viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6b). Boxplots represent richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) available nitrogen, (G) organic content, (H) eukaryote species richness, (I) eukaryote species Shannon diversity, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Extended Data Fig. 10 Ecological factors significantly associated with viral richness in animal-associated viruses.
(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7b), along with the richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) organic content, (G) eukaryote species richness, and (H) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.
Supplementary information
Supplementary Information
Supplementary Figs. 1–38.
Supplementary Table 1
Supplementary Tables 1–3.
Supplementary Data 1
Statistical source data for Supplementary Fig. 36.
Supplementary Data 2
Statistical source data for Supplementary Fig. 37.
Supplementary Data 3
Statistical source data for Supplementary Fig. 38.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2.
Source Data Fig. 6
Statistical source data for Fig. 6.
Source Data Fig. 7
Statistical source data for Fig. 7.
Source Data Extended Data Fig. 2
Statistical source data for Extended Data Fig. 2.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, YM., Sadiq, S., Tian, JH. et al. RNA viromes from terrestrial sites across China expand environmental viral diversity. Nat Microbiol 7, 1312–1323 (2022). https://doi.org/10.1038/s41564-022-01180-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-022-01180-2
This article is cited by
-
Discovery of novel RNA viruses through analysis of fungi-associated next-generation sequencing data
BMC Genomics (2024)
-
Hidden diversity and potential ecological function of phosphorus acquisition genes in widespread terrestrial bacteriophages
Nature Communications (2024)
-
Massive expansion of the pig gut virome based on global metagenomic mining
npj Biofilms and Microbiomes (2024)
-
Mapping glycoprotein structure reveals Flaviviridae evolutionary history
Nature (2024)
-
First detection of Jingmen tick virus in Corsica with a new generic RTqPCR system
npj Viruses (2024)