Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

RNA viromes from terrestrial sites across China expand environmental viral diversity

Abstract

Environmental RNA viruses are ubiquitous and diverse, and probably have important ecological and biogeochemical impacts. Understanding the global diversity of RNA viruses is limited by sampling biases, dependence on cell culture and PCR for virus discovery, and a focus on viruses pathogenic to humans or economically important animals and plants. To address this knowledge gap, we generated metatranscriptomic sequence data from 32 diverse environments in 16 provinces and regions of China. We identified 6,624 putatively novel virus operational taxonomic units from soil, sediment and faecal samples, greatly expanding known diversity of the RNA virosphere. These newly identified viruses included positive-sense, negative-sense and double-strand RNA viruses from at least 62 families. Sediments and animal faeces were rich sources of viruses. Virome compositions were affected by local environmental factors, including organic content and eukaryote species abundance. Notably, environmental factors had a greater impact on the abundance and diversity of plant, fungal and bacterial viruses than of animal viromes. Our data confirm that RNA viruses are an integral part of both terrestrial and aquatic ecosystems.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Map of sample locations and sample types.
Fig. 2: Viral composition by local environment.
Fig. 3: Phylogenetic trees of RNA viruses based on the RdRp domain.
Fig. 4: Genome organization of RNA viruses.
Fig. 5: Viral abundance and diversity of each library.
Fig. 6: Determinants of viral abundance and diversity in environmental viruses.
Fig. 7: Determinants of viral abundance and diversity in animal-associated viruses.

Similar content being viewed by others

Data availability

The sequence reads generated in this study are available at the NCBI Sequence Read Archive (SRA) database under BioProject accession PRJNA716119. All viral sequences generated in this study have been deposited in GenBank under accession numbers (https://www.ncbi.nlm.nih.gov/nuccore?term=716119%5BBioProject%5D) MW784004-MW784109, MW896840-MW897324, MZ218144-MZ218759, MZ556337-MZ556592, MZ678955-MZ680357, ON049747-ON050964, ON161767-ON164489. All other data are available in the paper or in the supplementary materials. The CheckV database used for viral genome quality and completeness estimation can be accessed via https://bitbucket.org/berkeleylab/checkv. The Conserved Domain Database (CDD) used for ORF annotation can be accessed via https://www.ncbi.nlm.nih.gov/cdd/. The UniRef30_2021_03 database used in HHblits analysis can be accessed via http://wwwuser.gwdg.de/~compbiol/uniclust/2021_03/. The SILVA database used for rRNA removal can be accessed via https://www.arb-silva.de/. Source data are provided with this paper.

References

  1. Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. Zhang, Y.-Z., Shi, M. & Holmes, E. C. Using metagenomics to characterize an expanding virosphere. Cell 172, 1168–1172 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Li, C.-X. et al. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. eLife 4, e05378 (2015).

    Article  PubMed Central  Google Scholar 

  4. Starr, E. P., Nuccio, E. E., Pett-Ridge, J., Banfield, J. F. & Firestone, M. K. Metatranscriptomic reconstruction reveals RNA viruses with the potential to shape carbon cycling in soil. Proc. Natl Acad. Sci. USA 116, 25900–25908 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wolf, Y. I. et al. Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome. Nat. Microbiol. 5, 1262–1270 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zayed, A. A. et al. Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome. Science 376, 156–162 (2022).

    Article  CAS  PubMed  Google Scholar 

  7. Simmonds, P. et al. Virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).

    Article  CAS  PubMed  Google Scholar 

  8. Trubl, G., Hyman, P., Roux, S. & Abedon, S. T. Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Syst. 4, 23 (2020).

    Article  CAS  Google Scholar 

  9. Jin, M. et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome 7, 58 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Trubl, G. et al. Soil viruses are underexplored players in ecosystem carbon processing. mSystems 3, e00076-18 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Steward, G. F. et al. Are we missing half of the viruses in the ocean? ISME J. 7, 672–679 (2013).

    Article  CAS  PubMed  Google Scholar 

  12. Paul, K. I., Scott Black, A. & Conyers, M. K. in Advances in Agronomy. Sparks, D.L., Vol. 78 187–214 (Elsevier, 2003).

  13. Urayama, S., Takaki, Y. & Nunoura, T. FLDS: a comprehensive dsRNA sequencing method for intracellular RNA virus surveillance. Microbes Environ. 31, 33–40 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Armbrust, E. V. The life of diatoms in the world’s oceans. Nature 459, 185–192 (2009).

    Article  CAS  PubMed  Google Scholar 

  15. Wu, W., Jin, Y., Bai, F. & Jin, S. in Molecular Medical Microbiology. Tang, Y.W., Liu, D., Schwartzman, J., Sussman, M., Poxton, I., 753–767 (Elsevier, 2015).

  16. Cooney, S., O’Brien, S., Iversen, C. & Fanning, S. in Encyclopedia of Food Safety. Motarjemi, Y., 433–441 (Elsevier, 2014).

  17. Geoghegan, J. L. et al. Hidden diversity and evolution of viruses in market fish. Virus Evol. 4, vey031 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Lauber, C. et al. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non-enveloped fish viruses. Cell Host Microbe 22, 387–399.e6 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Shi, M., Zhang, Y.-Z. & Holmes, E. C. Meta-transcriptomics and the evolutionary biology of RNA viruses. Virus Res. 243, 83–90 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Turnbull, O. M. H. et al. Meta-transcriptomic identification of divergent Amnoonviridae in Fish. Viruses 12, 1254 (2020).

    Article  CAS  PubMed Central  Google Scholar 

  21. Bauermann, F. V., Hause, B., Buysse, A. R., Joshi, L. R. & Diel, D. G. Identification and genetic characterization of a porcine hepe-astrovirus (bastrovirus) in the United States. Arch. Virol. 164, 2321–2326 (2019).

    Article  CAS  PubMed  Google Scholar 

  22. Oude Munnink, B. B. et al. A novel astrovirus-like RNA virus detected in human stool. Virus Evol. 2, vew005 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Williamson, K. E. et al. Estimates of viral abundance in soils are strongly influenced by extraction and enumeration methods. Biol. Fertil. Soils 49, 857–869 (2013).

    Article  Google Scholar 

  24. Wang, C., Liu, D. & Bai, E. Decreasing soil microbial diversity is associated with decreasing microbial biomass under nitrogen addition. Soil Biol. Biochem. 120, 126–133 (2018).

    Article  CAS  Google Scholar 

  25. Wang, Q. et al. Effects of nitrogen and phosphorus inputs on soil bacterial abundance, diversity, and community composition in Chinese fir plantations. Front. Microbiol. 9, 1543 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Payne, S. in Viruses. Payne, S., 219–226 (Elsevier, 2017).

  27. Hillman, B. I. & Cai, G. The family Narnaviridae. Adv. Virus Res. 86, 149–176 (2013).

    Article  PubMed  Google Scholar 

  28. Wolf, Y. I. et al. Origins and evolution of the global RNA virome. mBio 9, e02329-18 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article  CAS  PubMed  Google Scholar 

  31. Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  36. Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analysis in R. Bioinformatics 35, 526–528 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

    Article  Google Scholar 

  38. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).

    Article  CAS  PubMed  Google Scholar 

  40. Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423 (2019).

    Article  CAS  PubMed  Google Scholar 

  41. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).

    Article  CAS  PubMed  Google Scholar 

  42. Gupta, R., Jung, E. & Brunak, S. NetNGlyc 1.0 Server (2017). DTU Health Tech. http://www.cbs.dtu.dk/services/NetNGlyc/

  43. Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).

    Article  CAS  PubMed  Google Scholar 

  44. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).

    Article  CAS  Google Scholar 

  45. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lagkouvardos, I., Fischer, S., Kumar, N. & Clavel, T. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. PeerJ 5, e2836 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  48. McLeod, A., Xu, C. & Lai, Y. Package ‘bestglm’. CRAN. (2020).

  49. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank D.-X. Wang, K. Li, W.-B. Zhao, X.-N. Diao, A.-J. Gong, Y.-L. Zhang, J.-B. Wang, H. Luo, D.-A. Zhang, Y.-Q. Zhao and M.-Li for their contributions to sample collection, and X.-Q. Luo, R.-X. Hu, M.-Z. Liu, J. Liu, Y. Jiang, J.-J. Guo, J.-J. Wang and P. Lu for assisting with PCR confirmations. This study was supported by the National Natural Science Foundation of China (grant nos. 32130002, 31930001, 32041004, 81861138003 and 81672057 to Y.-Z.Z) and the National Key R&D Program of China (2016YFC1201900 to Y.-Z.Z). E.C.H was supported by an ARC Australian Laureate Fellowship (FL170100022).

Author information

Authors and Affiliations

Authors

Contributions

Y.-Z.Z. conceived and designed the study. Y.-M.C., J.-H.T., X.C., X.-D.L., J.-J.S., H.C., Z.-Y.H., W.-D.Y., Z.-C.Z., J.W., F.L., H.-W.W. and Q.-Y.X. performed sample collection and geographic information recording. S.S., Y.-M.C., M.W. and E.C.H. analysed the data. Y.-M.C., W.-H.G. and W.W. performed the experiments. S.S., Y.-M.C., E.C.H. and Y.-Z.Z. wrote the paper with input from all authors. Y.-Z.Z. led the study.

Corresponding author

Correspondence to Yong-Zhen Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1

Environments and Chinese provinces sampled in this study.

Extended Data Fig. 2 Viral composition of each library.

Relative proportions were determined by the number of reads corresponding to contigs with viral hits to each viral clade as a proportion of total viral reads in each of 442 biologically independent samples (analysis performed using DIAMOND BLASTX).

Source data

Extended Data Fig. 3 Ecological factors significantly associated with viral abundance in environmental viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6a). Boxplots represent abundance values plotted against (B) environment, (C) location, (D) total phosphorus, (E) total potassium, (F) available phosphorus, (G) available potassium, (H) organic content, (I) eukaryote species abundance, and (J) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 4 Ecological factors significantly associated with viral abundance in animal-associated viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral abundance with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7a), along with the abundance values plotted against (B) location, (C) organic content, and (D) eukaryote species abundance. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 5 Ecological factors significantly associated with Shannon diversity in environmental viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6c). Boxplots represent Shannon diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, and (I) eukaryote species richness. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 6 Ecological factors significantly associated with true diversity in environmental viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6d), along with the true diversity values plotted against (B) environment, (C) location, (D) pH, (E) total nitrogen, (F) total potassium, (G) available nitrogen, (H) organic content, (I) eukaryote species richness, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 7 Ecological factors significantly associated with Shannon diversity in animal-associated viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on Shannon diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7c), along with the Shannon diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 8 Ecological factors significantly associated with true diversity in animal-associated viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on true diversity with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7d), along with the true diversity values plotted against (B) environment. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 9 Ecological factors significantly associated with viral richness in environmental viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 6b). Boxplots represent richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) available nitrogen, (G) organic content, (H) eukaryote species richness, (I) eukaryote species Shannon diversity, and (J) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Extended Data Fig. 10 Ecological factors significantly associated with viral richness in animal-associated viruses.

(A) Model-averaged effect sizes of ecological metadata for 265 biologically independent samples on viral richness with confidence intervals of 95% and factors significant in the best subset selected model (p < 0.05, in orange) do not overlap the central line and are denoted with an asterisk (*) (identical to Fig. 7b), along with the richness values plotted against (B) environment, (C) location, (D) pH, (E) total phosphorus, (F) organic content, (G) eukaryote species richness, and (H) eukaryote species true diversity. For all boxplots, the horizontal box lines in the boxplots represent the first quartile, the median, and the third quartile. Whiskers denote the range of points within the first quartile − 1.5× the interquartile range and the third quartile + 1.5× the interquartile range.

Supplementary information

Supplementary Information

Supplementary Figs. 1–38.

Reporting Summary

Peer Review File

Supplementary Table 1

Supplementary Tables 1–3.

Supplementary Data 1

Statistical source data for Supplementary Fig. 36.

Supplementary Data 2

Statistical source data for Supplementary Fig. 37.

Supplementary Data 3

Statistical source data for Supplementary Fig. 38.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 6

Statistical source data for Fig. 6.

Source Data Fig. 7

Statistical source data for Fig. 7.

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Fig. 2.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, YM., Sadiq, S., Tian, JH. et al. RNA viromes from terrestrial sites across China expand environmental viral diversity. Nat Microbiol 7, 1312–1323 (2022). https://doi.org/10.1038/s41564-022-01180-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41564-022-01180-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing