Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 15;26(20):2571-7.
doi: 10.1093/bioinformatics/btq406. Epub 2010 Jul 14.

Probabilistic analysis of gene expression measurements from heterogeneous tissues

Affiliations

Probabilistic analysis of gene expression measurements from heterogeneous tissues

Timo Erkkilä et al. Bioinformatics. .

Abstract

Motivation: Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in experiments that focus on studying cell types, e.g. their expression profiles, in isolation. Although sample heterogeneity can be addressed by manual microdissection, prior to conducting experiments, computational treatment on heterogeneous measurements have become a reliable alternative to perform this microdissection in silico. Favoring computation over manual purification has its advantages, such as time consumption, measuring responses of multiple cell types simultaneously, keeping samples intact of external perturbations and unaltered yield of molecular content.

Results: We formalize a probabilistic model, DSection, and show with simulations as well as with real microarray data that DSection attains increased modeling accuracy in terms of (i) estimating cell-type proportions of heterogeneous tissue samples, (ii) estimating replication variance and (iii) identifying differential expression across cell types under various experimental conditions. As our reference we use the corresponding linear regression model, which mirrors the performance of the majority of current non-probabilistic modeling approaches. AVAILABILITY AND SOFTWARE: All codes are written in Matlab, and are freely available upon request as well as at the project web page http://www.cs.tut.fi/∼erkkila2/. Furthermore, a web-application for DSection exists at http://informatics.systemsbiology.net/DSection.

Contact: timo.p.erkkila@tut.fi; harri.lahdesmaki@tut.fi

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Analysis results with simulated data—3 cell types, 2 experimental conditions, 700 genes and 14 samples (seven for each experimental condition). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods (solid lines). As a reference, best performance, obtained by plugging the true cell-type proportions into the linear regression model and performing the analysis, along with the worst performance (diagonal in ROC plots) are visualized as dashed lines. (c) Estimation of measurement SD (given as formula image). Estimation of measurement SD for (d) The linear regression model with fixed cell-type proportions, (e) DSection with fixed cell-type proportions and (f) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes—higher color intensity means higher average differential expression. Clearly, SD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell-type proportions are not properly accounted for [(d) and (e) versus (f)].
Fig. 2.
Fig. 2.
Analysis results with Affymetrix data—2 cell types, 1 experimental condition, ∼15 000 genes and 6 samples (25%/75% and vice versa). (a) Estimation of cell-type proportions (bright spots), given noisy priors (faint spots). (b) ROC curves of the compared methods. Estimation of measurement STD for (c) The linear regression model with fixed cell-type proportions and (d) DSection with varying cell-type proportions, where estimates are colored depending on true, average differential expressions of probes. Again, as with simulated data, STD estimation accuracy for highly differentially expressed genes is poor when uncertainty in cell type proportions are not properly accounted for [(c) versus (d)].
Fig. 3.
Fig. 3.
MAD for cell-type proportion estimates (referenced against the ground-truth). MAD for the linear regression model basically stands for the baseline, i.e. cell-type proportion estimation was not supported by the model, and anything below that (black bars) is considered as improvement. In terms of MAD, DSection (gray bars) is able to recover true cell-type proportions under noisy estimates.

Similar articles

Cited by

References

    1. Abbas AR, et al. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One. 2009;4:e6098. - PMC - PubMed
    1. Affymetrix (2009) Available at http://www.affymetrix.com/support/technical/sample_data/gene_1_0_array__.... (last accessed date June 22, 2009)
    1. Andrieu C, et al. An introduction to mcmc for machine learning. Mach. Learn. 2003;50:5–43.
    1. Cowles MK, Carlin BP. Markov chain monte carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 1996;91:883–904.
    1. Efron B. Are a set of microarrays independent of each other? Ann. Appl. Stat. 2009;3:922–942. - PMC - PubMed

Publication types

MeSH terms