Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 15;28(22):2861-9.
doi: 10.1093/bioinformatics/bts561. Epub 2012 Sep 26.

integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory

Affiliations

integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory

Pan Tong et al. Bioinformatics. .

Abstract

Motivation: Identifying genes altered in cancer plays a crucial role in both understanding the mechanism of carcinogenesis and developing novel therapeutics. It is known that there are various mechanisms of regulation that can lead to gene dysfunction, including copy number change, methylation, abnormal expression, mutation and so on. Nowadays, all these types of alterations can be simultaneously interrogated by different types of assays. Although many methods have been proposed to identify altered genes from a single assay, there is no method that can deal with multiple assays accounting for different alteration types systematically.

Results: In this article, we propose a novel method, integration using item response theory (integIRTy), to identify altered genes by using item response theory that allows integrated analysis of multiple high-throughput assays. When applied to a single assay, the proposed method is more robust and reliable than conventional methods such as Student's t-test or the Wilcoxon rank-sum test. When used to integrate multiple assays, integIRTy can identify novel-altered genes that cannot be found by looking at individual assay separately. We applied integIRTy to three public cancer datasets (ovarian carcinoma, breast cancer, glioblastoma) for cross-assay type integration which all show encouraging results.

Availability and implementation: The R package integIRTy is available at the web site http://bioinformatics.mdanderson.org/main/OOMPA:Overview.

Contact: kcoombes@mdanderson.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of item characteristic curve (ICC). (a) Exemplar ICC with a difficulty level of 0.5 and discrimination 1. (b) ICCs from real data. The first four OV patient samples in each assays are shown here
Fig. 2.
Fig. 2.
Pair-wise smoothed density (darker cloud indicates higher density) of estimated latent trait for alteration (upper panels) and Spearman rank correlations (lower panels) among different assays and integrated data. When normal control samples are available for all assays, we also show the correlations of computed P-values from conventional methods in bracket. (a) OV dataset. (b) BRCA dataset. (c) GBM dataset
Fig. 3.
Fig. 3.
Relations between integrated and individual gene lists in OV data. We selected the top (100–1000) genes from the integrated analysis and from individual assays (E, expression; M, methylation; C, copy number). Each bar is equivalent to a Venn diagram showing how many of the top genes from the integrated analysis came from one, two (EM, expression and methylation; EC, expression and copy number; MC, methylation and copy number) or all three (EMC) individual assay gene lists. Black regions and numbers at the top of each bar count the number of ‘novel’ genes that only appear on the list from the integrated analysis
Fig. 4.
Fig. 4.
Example genes with disconcordant calls between conventional method and our method. The original measurement is plotted against sample index after sorting by tissue type and batch number. Red circles indicate altered values based on dichotomized data; green circles indicate unaltered values. The expression from normal control samples are indicated by solid green dots. Black solid lines represent tumor and normal mean. Dashed lines denote the component means estimated from two-component mixture. In the panel titles, we show gene symbol, latent ability, percentage of tumor samples altered (rate) and conventional test P-value. (a) A typical gene missed by t-test but identified by our method. Bimodality index (BI) shown in the title strongly suggests a subgroup of the tumor samples have a large magnitude of overexpression compared to normal samples and hence, is likely to be altered. (b) A gene missed by our method but flagged by t-test. This is an example where statistical significance does not imply biological significance. The difference between tumor and normal sample is minor. As a result, our method makes the correct decision. (c) A typical gene missed by rank test but flagged by our method. More than 50% of the tumor samples have increased methylation which strongly suggests altered methylation. (d) A gene missed by our method but flagged by rank test. The trend of beta value here is mostly due to batch effect, not biological difference. All tumor and normal samples are not methylated (β > 0.25). Accordingly, our method assigns a very low-latent trait estimate. In comparison, the conventional method dictates a strong statistical difference between tumor and normal simply due to batch effect
Fig. 5.
Fig. 5.
Complementary information provided by integIRTy and CNAmet. (a) Overexpression in tumor where the regulation by methylation and CN is not synergistic. As a result, CNAmet fails to detect it. (b) Mild overexpression mainly driven by CN gain. integIRTy did not detect this gene due to the high-background CN change. (c) Overexpression in tumor samples driven by hypomethylation and CN gain. Genes like this are easy to be detected by both methods. (d) Expression is turned off in both tumor and normal samples due to hypermethylation. Since there is little difference between tumor and normal, both methods suggest it is not altered

Similar articles

Cited by

References

    1. Ackermann M, Strimmer K. A general modular framework for gene set enrichment analysis. BMC Bioinformatics. 2009;10:47. - PMC - PubMed
    1. Allison DB, et al. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 2006;7:55–65. - PubMed
    1. Andersen E. Discrete Statistical Models with Social Science Applications. 1980. North-Holland, Amsterdam.
    1. Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–573.
    1. Aoki M, et al. Expression of developmentally regulated endothelial cell locus 1 was induced by tumor-derived factors including VEGF. Biochem. Biophys. Res. Commun. 2005;333:990–995. - PubMed

Publication types