Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 17;158(2):250-262.
doi: 10.1016/j.cell.2014.06.037.

Conducting a microbiome study

Affiliations

Conducting a microbiome study

Julia K Goodrich et al. Cell. .

Abstract

Human microbiome research is an actively developing area of inquiry, with ramifications for our lifestyles, our interactions with microbes, and how we treat disease. Advances depend on carefully executed, controlled, and reproducible studies. Here, we provide a Primer for researchers from diverse disciplines interested in conducting microbiome research. We discuss factors to be considered in the design, execution, and data analysis of microbiome studies. These recommendations should help researchers to enter and contribute to this rapidly developing field.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Conducting a Microbiome Study
The sequential steps of conducting a microbiome study are diagramed, mirroring the sections of this Primer.
Figure 2
Figure 2. The Maternal Effect Can Confound the Experimental Effect
(A and B) In this mock example, each point represents a gut microbial community as characterized by a set of 16S rRNA gene sequences from a single mouse sample. In principal coordinates analysis (PCoA), points that are closer together represent microbial communities that are more similar in sequence composition. Samples from two different mouse genotypes are represented, and the mice are derived from two different dams. In all panels, squares indicate wild-type, and circles indicate mutant mouse genotypes. In (A), the effect of genotype is confounded by the effect of a shared dam, whereas in (B), the effect of dam is randomized across the two genotypes.
Figure 3
Figure 3. Principal Coordinates Analysis and Classification Methods
(A) Principal coordinates (PCs) from a principal coordinates analysis (PCoA) are plotted against each other to summarize the microbial community compositional differences between samples. Each point represents a single sample, and the distance between points represents how compositionally different the samples are from one another. The points are colored by health state, showing a clear difference in the microbial community composition between diseased (green) and healthy (purple). (B) Classification methods can be used to determine which OTUs discriminate between the healthy and diseased groups, and a heatmap can be used to visualize over/under representation of these OTUs in the groups. In this example, the abundances of the four discriminatory OTUs (rows) are colored from low abundance (blue) to high abundance (red) in the 47 samples (columns). Both the PCoA plot and the sample dendrogram in the heatmap show that the separation between disease and health states is not perfect. There is some overlap in the composition of these samples, though the placement of points in the PCoA plot is far from random. This observation should be supported with statistical analysis. For example, a Monte Carlo two-sample t test, comparing the distribution of within-group distances to the distribution of between-group distances applied to these data tells us that this clustering pattern is statistically significant.
Figure 4
Figure 4. Use Caution when Applying Unsupervised Classification to Data Gradients
(A–C) In this simulated microbiome data set, a principal coordinates analysis (PCoA) was performed, and the first two principal coordinates, PC1 and PC2, are plotted. The exact same set of points is shown in panels (A–E) but is colored differently. In (A), samples are all colored black to show that they form gradients along PCs 1 and 2. In (B) and (C), two sets of clusters were designated by bisecting the spread of samples. In (B), half of the samples form the red cluster, and the second half form the Blue cluster along PC1. In (C), half of the samples are in the Green cluster, and the second half form the Yellow cluster along PC2. In (B) and (C), starplots display inferred clusters; this display can give the misleading impression of distinct clusters (see A; the data structure consists of gradients, not distinct clusters). (D–G) In (D) and (E), the samples are colored according to the abundances of the taxa that drive their separation along PCs 1 and 2. (D) The abundance of sequences belonging to the Bacteroidetes phylum drives the spread of samples along PC1; (E) abundances of Proteobacteria in the samples drive their spread along PC2. When the relative abundances for these phyla in samples are averaged (F and G), it is apparent that the Blue samples, which are at the “low end” of the Bacteroidetes gradient, have lower means than the Red samples, which are at the high end (F). Similarly, because the Yellow/Green samples are spread along PC2 according to their abundance of Proteobacteria, these two groups will also exhibit different mean abundances (G). Therefore, plotting mean values of the abundances of taxa that drive the gradients in the PCoA plots does not constitute a validation of the PCoA patterns.
None

Similar articles

Cited by

References

    1. Abarenkov K, Henrik Nilsson R, Larsson KH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, et al. The UNITE database for molecular identification of fungi—recent updates and future perspectives. New Phytol. 2010;186:281–285. - PubMed
    1. Acinas SG, Sarma-Rupavtarm R, Klepac-Ceraj V, Polz MF. PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample. Appl Environ Microbiol. 2005;71:8966–8969. - PMC - PubMed
    1. Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12:R18. - PMC - PubMed
    1. Angiuoli SV, Matalka M, Gussman A, Galens K, Vangala M, Riley DR, Arze C, White JR, White O, Fricke WF. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics. 2011;12:356. - PMC - PubMed
    1. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JM, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–180. - PMC - PubMed