Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 11;20(1):e1011717.
doi: 10.1371/journal.pcbi.1011717. eCollection 2024 Jan.

Tissue-adjusted pathway analysis of cancer (TPAC): A novel approach for quantifying tumor-specific gene set dysregulation relative to normal tissue

Affiliations

Tissue-adjusted pathway analysis of cancer (TPAC): A novel approach for quantifying tumor-specific gene set dysregulation relative to normal tissue

H Robert Frost. PLoS Comput Biol. .

Abstract

We describe a novel single sample gene set testing method for cancer transcriptomics data named tissue-adjusted pathway analysis of cancer (TPAC). The TPAC method leverages information about the normal tissue-specificity of human genes to compute a robust multivariate distance score that quantifies gene set dysregulation in each profiled tumor. Because the null distribution of the TPAC scores has an accurate gamma approximation, both population and sample-level inference is supported. As we demonstrate through an analysis of gene expression data for 21 solid human cancers from The Cancer Genome Atlas (TCGA) and associated normal tissue expression data from the Human Protein Atlas (HPA), TPAC gene set scores are more strongly associated with patient prognosis than the scores generated by existing single sample gene set testing methods.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. TPAC empirical power for different simulated effect sizes.
The simulation model is detailed in section ‘Inference with TPAC scores maintains type I error control’.
Fig 2
Fig 2. Heatmap illustrating the pan-cancer distribution of S matrix TPAC scores for the MSigDB Hallmark gene sets.
Annotations along the top reflect cancer type and annotations on the bottom represent the four main types of dyregulation pattern (1: overall dysregulation across all gene sets, 2: minimal dysregulation, 3: immune signaling dysregulation, and 4: proliferation dysregulation).
Fig 3
Fig 3. Q-Q plot comparing p-values from Cox proportional hazards models that use single sample gene set scores generated by TPAC and each of the comparison techniques as the single predictor and PFI as the outcome against the U(0, 1) distribution expected under the null.
For each evaluated method, a separate test is performed for all 50 MSigDB Hallmark pathways for each of the 21 analyzed TCGA cancer types for a total of 1,050 tests per method. The results for each evaluated single sample gene set testing method are plotted separately with the number of hypothesis tests out of a family of 1,050 tests associated with FDR values ≤ 0.1 listed in paratheses after the method in the legend.
Fig 4
Fig 4. Q-Q plot comparing the distribution of p-values from Wilcoxon rank sum tests comparing single sample gene set scores generated by TPAC and each of the comparison techniques for tumors with stage T01 vs. the scores for tumors with higher stages.
For each evaluated method, a separate test is performed for all 50 MSigDB Hallmark pathways for each of the 21 analyzed TCGA cancer types for a total of 1,050 tests per method. The results for each evaluated single sample gene set testing methods are plotted separately with the number of hypothesis tests out of a family of 1,050 tests associated with FDR values ≤ 0.1 listed in paratheses after the method in the legend.
Fig 5
Fig 5. Q-Q plot comparing the distribution of p-values from Wilcoxon rank sum tests comparing single sample gene set scores generated by TPAC and each of the comparison techniques for tumors associated with lymph node stage N0 vs. the scores for tumors associated with higher lymph node stages.
For each evaluated method, a separate test is performed for all 50 MSigDB Hallmark pathways for each of the 21 analyzed TCGA cancer types for a total of 1,050 tests per method. The results for each evaluated single sample gene set testing methods are plotted separately with the number of hypothesis tests out of a family of 1,050 tests associated with FDR values ≤ 0.1 listed in paratheses after the method in the legend.
Fig 6
Fig 6. Illustration of the pan-cancer significance of S matrix TPAC scores for the MSigDB Hallmark gene sets.
This figure visualizes the same data as Fig 2 but with each TPAC scores whose associated FDR value is ≥ 0.3 set to 0 (X total hypotheses).
Fig 7
Fig 7. Kaplan-Meir plot for TCGA KIRP cohort and progression-free interval (PFI) outcome with patients stratified according to the significance of the TPAC score for the MSigDB Hallmark MYC Targets V1 gene set.
Significance was determined according to whether the FDR value associated with the TPAC score was < 0.25 where the family of hypotheses included the TPAC scores for all 50 Hallmark gene sets for all 321 KIRP samples with PFI data (16,050 total hypotheses).
Fig 8
Fig 8. Association between transcription factor (TF) activity, as estimated using the decoupleR method, and TPAC scores.
Each cell represents the average rank correlation between overall TPAC scores and TF activity estimates for one of the TCGA cohorts where averaging is performed across all 50 MSigDB Hallmark gene sets. Results are only shown for the 25 TFs with the largest average absolute correlation.

Similar articles

References

    1. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. Cancer genome landscapes. Science. 2013;339(6127):1546–58. doi: 10.1126/science.1235122 - DOI - PMC - PubMed
    1. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40. doi: 10.1093/bioinformatics/btr260 - DOI - PMC - PubMed
    1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006;7(1):55–65. doi: 10.1038/nrg1749 - DOI - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al.. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102 - DOI - PMC - PubMed
    1. Goeman JJ, Buehlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23(8):980–987. doi: 10.1093/bioinformatics/btm051 - DOI - PubMed