Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 20;11(1):e0145252.
doi: 10.1371/journal.pone.0145252. eCollection 2016.

Confounding Factors in the Transcriptome Analysis of an In-Vivo Exposure Experiment

Affiliations

Confounding Factors in the Transcriptome Analysis of an In-Vivo Exposure Experiment

Oskar Bruning et al. PLoS One. .

Abstract

Confounding factors: In transcriptomics experimentation, confounding factors frequently exist alongside the intended experimental factors and can severely influence the outcome of a transcriptome analysis. Confounding factors are regularly discussed in methodological literature, but their actual, practical impact on the outcome and interpretation of transcriptomics experiments is, to our knowledge, not documented. For instance, in-vivo experimental factors; like Individual, Sample-Composition and Time-of-Day are potentially formidable confounding factors. To study these confounding factors, we designed an extensive in-vivo transcriptome experiment (n = 264) with UVR exposure of murine skin containing six consecutive samples from each individual mouse (n = 64).

Analysis approach: Evaluation of the confounding factors: Sample-Composition, Time-of-Day, Handling-Stress, and Individual-Mouse resulted in the identification of many genes that were affected by them. These genes sometimes showed over 30-fold expression differences. The most prominent confounding factor was Sample-Composition caused by mouse-dependent skin composition differences, sampling variation and/or influx/efflux of mobile cells. Although we can only evaluate these effects for known cell type specifically expressed genes in our complex heterogeneous samples, it is clear that the observed variations also affect the cumulative expression levels of many other non-cell-type-specific genes.

Anova: ANOVA analysis can only attempt to neutralize the effects of the well-defined confounding factors, such as Individual-Mouse, on the experimental factors UV-Dose and Recovery-Time. Also, by definition, ANOVA only yields reproducible gene-expression differences, but we found that these differences were very small compared to the fold changes induced by the confounding factors, questioning the biological relevance of these ANOVA-detected differences. Furthermore, it turned out that many of the differentially expressed genes found by ANOVA were also present in the gene clusters associated with the confounding factors.

Conclusion: Hence our overall conclusion is that confounding factors have a major impact on the outcome of in-vivo transcriptomics experiments. Thus the set-up, analysis, and interpretation of such experiments should be approached with the utmost prudence.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Variation caused by confounding factors.
Selected samples to exemplify the influence of confounding factors (Table 2) derived from the PCA analysis of the complete experiment. Sample means are used except for D and E. A, UV-Dose (green = untreated, red = high dose); B, Recovery-Time (hours) ≈ Time-of-Day; C, Recovery-Time (light green = early, green = late); D, Trp53-Genotype (green = WT, blue = Trp53-72P mutant); E, Handling- & Biopsy-Stress (light green = Early recovery, green = Late recovery); and F, Individual-Mouse (yellow = #31, black = #55).
Fig 2
Fig 2. Discovering clusters of genes with very similar gene expression.
A: PCA plot of all genes using the untreated WT samples. Indicated are Subsets I and II, with 155 and 109 genes respectively. Red, genes in Subset I that are also in Cluster SC-A. Blue, genes in Subset II that are also in Cluster SC-B; B: A heatmap of microarray signals in all untreated WT samples of the top 5,000 genes with the biggest variance over the whole experiment. The five Sample-Composition and one Time-of-Day specific clusters as explained in the main text are indicated. C: Zoom-in of the five Sample-Composition clusters to reveal the similar expression per cluster over all WT samples.
Fig 3
Fig 3. Self-organizing maps of gene clusters with similar gene expression.
2x2 Self-Organizing Maps (SOMS) per individual, untreated WT mouse (#) of the scaled gene-expression profiles of the gene clusters from the heatmap of Fig 2 (S3 Table) for both the Early (0, 1, 2, 3, 4, 5 hours) and Late (0, 7.5, 9, 10.5, 12 hours) time points.
Fig 4
Fig 4. Examples of time-related gene expression from Cluster TD-A.
A: Examples of expression profiles over time of genes involved in circadian rhythm for untreated WT samples. B: A heatmap for the 19 genes of Cluster Time-of-Day (TD)-A. C: 2x2 SOMS of the scaled gene-expression profiles per mouse of gene Cluster TD-A.
Fig 5
Fig 5. Examples of time-related gene expression from Group TD-B.
A: Examples of time-related expression profiles for untreated WT samples for genes from Group TD-B. B: A boxplot for the variation of the 333 genes in Group TD-B. C: 2x2 SOMS per mouse of the scaled gene-expression profiles of gene group TD-B.
Fig 6
Fig 6. Examples of biopsy-stress-related gene expression from Group HS-A.
A: Re-labeling of samples from time points to biopsy-order number (B#, cf. main text). B: Examples of Handling-Stress expression profiles for untreated WT samples for genes from Group HS-A. C: 2x2 SOMS per mouse of the scaled gene-expression profiles of the 83 genes in group HS-A.
Fig 7
Fig 7. Examples of individual-mouse-related gene expression from Group IM-A.
A: Examples of expression profiles for untreated WT samples for genes from Group IM-A. B: Boxplot of absolute ANOVA Individual-Mouse effect of all genes per untreated WT mouse. C: 2x2 SOMS per mouse of the scaled gene-expression profiles of all 391 genes in group IM-A.
Fig 8
Fig 8. The contribution of the various ANOVA components.
The ANOVA components: Y = μ + Individual Mouse effect + UV-Dose & Recovery-Time effect + error evaluated in boxplots for the indicated groups and clusters of genes.

Similar articles

Cited by

References

    1. Altman RB, Raychaudhuri S. Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol. 2001;11: 340–347. 10.1016/S0959-440X(00)00212-8 - DOI - PubMed
    1. Brazma A, Vilo J. Gene expression data analysis. FEBS Lett. 2000;480: 17–24. 10.1016/S0014-5793(00)01772-5 - DOI - PubMed
    1. Churchill GA. Fundamentals of experimental design for cDNA microarrays. Nat Genet. 2002;32 Suppl: 490–5. 10.1038/ng1031 - DOI - PubMed
    1. Flaherty P, Arkin A, Jordan M. Robust design of biological experiments. Adv Neural Inf Process Syst 18. 2005; Available: http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_647.pdf
    1. Kathleen Kerr M. Design Considerations for Efficient and Effective Microarray Studies. Biometrics. Blackwell Publishing; 2003;59: 822–828. 10.1111/j.0006-341X.2003.00096.x - DOI - PubMed

Publication types

LinkOut - more resources