Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 17;5(2):e00869-19.
doi: 10.1128/mSystems.00869-19.

FORENSIC: an Online Platform for Fecal Source Identification

Affiliations

FORENSIC: an Online Platform for Fecal Source Identification

Adélaïde Roguet et al. mSystems. .

Abstract

Sewage overflows, agricultural runoff, and stormwater discharges introduce fecal pollution into surface waters. Distinguishing these sources is critical for evaluating water quality and formulating remediation strategies. With the falling costs of sequencing, microbial community-based water quality assessment tools are under development. However, their application is limited by the need to build reference libraries, which requires extensive sampling of sources and bioinformatic expertise. Here, we introduce FORest Enteric Source IdentifiCation (FORENSIC; https://forensic.sfs.uwm.edu/), an online, library-independent source tracking platform based on random forest classification and 16S rRNA gene amplicon sequences to identify in environmental samples common fecal contamination sources, including humans, domestic pets, and agricultural animals. FORENSIC relies on a broad reference signature database of Bacteroidales and Clostridiales, two predominant bacterial groups that have coevolved with their hosts. As a result, these groups demonstrate cohesive and reliable assemblage patterns within mammalian species or among species sharing the same diet/physiology. We created a scalable and extensible platform that we tested for global applicability using samples collected in distant geographic locations. This Web application offers a fast and intuitive approach for fecal source identification, particularly in sewage-contaminated waters.IMPORTANCE FORENSIC is an online platform to identify sources of fecal pollution without the need to create reference libraries. FORENSIC is based on the ability of random forest classification to extract cohesive source microbial signatures to create classifiers despite individual variability and to detect the signatures in environmental samples. We primarily focused on defining sewage signals, which are associated with a high human health risk in polluted waters. To test for fecal contamination sources, the platform only requires paired-end reads targeting the V4 or V6 regions of the 16S rRNA gene. We demonstrated that we could use V4V5 reads trimmed to the V4 positions to generate the reference signature. The systematic workflow we describe to create and validate the signatures could be applied to many disciplines. With the increasing gap between advancing technology and practical applications, this platform makes sequence-based water quality assessments accessible to the public health and water resource communities.

Keywords: 16S rRNA gene; Bacteroidales; Clostridiales; high-throughput sequencing; microbial source tracking; random forest classification; toolkit.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Phylogeny and distribution of the dominant V4 sequences across the fecal microbiota of eight host species. Ten fecal samples were averaged per host, except for dog, chicken, and goose, represented by 7, 6, and 8 samples, respectively. Only sequences (n = 287) with an average relative abundance per host of 0.5% are displayed. Colors in the inner circle depict phyla. Gray clades symbolize the dominant bacterial orders for at least one host. Log-transformed relative abundances were normalized by the maximum abundance for each host. Fuso., Fusobacteriales; Lacto., Lactobacillales. The tree was rooted using Halobellus ramosii strain S2FP14 (GenBank accession no. NR_145608.1). Sequences were aligned using MUSCLE implemented in MEGA (61). The tree was generated using the interactive Tree Of Life (iTOL) (62).
FIG 2
FIG 2
Bray-Curtis analysis of the Bacteroidales assemblage for the V4 region of the 16S rRNA gene. Color code shows the host, the type of sample (train or test the classifier), the initial 16S rRNA gene region amplified, and the geographical origin of the samples. Samples were ordered by source from the most similar samples (left) to the most dissimilar (right) based on the averaged intrasource Bray-Curtis dissimilarity comparisons. h., horse. Bray-Curtis analysis of the V4 Clostridiales and V6 Bacteroidales and Clostridiales assemblages are presented in Fig. S2 and S3 in the supplemental material, respectively.
FIG 3
FIG 3
Forensic interactive report, including (a) a source identification predictions table and (b) bubble plot to characterize the fecal signature in the submitted samples. (a) Colors indicate if the fecal source signature was characterized (red) or not (white or green) in the tested sample. Dark green sources indicate that at the voting tree probability observed, the specificity of the classifier was at least 80%; light green sources indicate a specificity of <80%. Global classifiers are represented by black rings. Discarded classifiers are symbolized using black circles. (b) Bubbles represent all the amplicon sequence variants (ASVs) that compose the classifiers for a given bacterial group. Blue bubbles show the ASVs recovered from the sample tested, unlike red bubbles. The sizes of the bubbles are proportional to the relative abundances of the ASV in the tested sample. Sources are represented by the outer arcs. The longer the arc is, the more that source contributed to the fecal pollution (among the sources investigated). The example shows the V4 Clostridiales profile recovered from a Chinese surface-water pond (SRA accession number SRR6037827) and classified as sewage by random forest. A total of 74 out of the 385 ASVs that compose all the classifiers were found in the sample; 55 and 14 ASVs were associated with the sewage and dog classifiers, respectively. The largest arc (representing 91% of the relative abundance of all the fecally associated ASVs) was associated with the sewage signature (bottom right). The second largest was associated with the dog and the third with the pig signature. The interactive version of this figure is available at https://forensic.sfs.uwm.edu/result/example. Data from Hägglund et al. and Hu et al. (16, 63). See Table S5 in the supplemental material for the full list of the individual predictions.

Similar articles

Cited by

References

    1. World Health Organization. 2011. Guidelines for drinking-water quality, 4th ed World Health Organization, Geneva, Switzerland.
    1. Santo Domingo JW, Ashbolt NJ. 2008. Fecal pollution of water, 1–12. In Cleveland CJ. (ed), Encyclopedia of Earth. National Council for Science and the Environment, Washington, DC.
    1. DeFlorio-Barker S, Wing C, Jones RM, Dorevitch S. 2018. Estimate of incidence and cost of recreational waterborne illness on United States surface waters. Environ Heal A Glob Access Sci Source 17:3. - PMC - PubMed
    1. Soller JA, Schoen ME, Bartrand T, Ravenscroft JE, Ashbolt NJ. 2010. Estimated human health risks from exposure to recreational waters impacted by human and non-human sources of faecal contamination. Water Res 44:4674–4691. doi:10.1016/j.watres.2010.06.049. - DOI - PubMed
    1. Schoen ME, Ashbolt NJ. 2010. Assessing pathogen risk to swimmers at non-sewage impacted recreational beaches. Environ Sci Technol 44:2286–2291. doi:10.1021/es903523q. - DOI - PubMed