Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12;22(1):22.
doi: 10.1186/s12859-020-03929-0.

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

Affiliations

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

Mike Fang et al. BMC Bioinformatics. .

Abstract

Background: In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets.

Results: We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting .

Conclusions: dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Keywords: Drug discovery; Gene set enrichment analysis; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Comparisons between dpGSEA and similar approaches. a dpGSEA’s primary differences compared to GSEA include usage of a-priori gene set information derived from the Broad Institute’s connectivity map project (CMAP) and the library of integrated network-based cellular signatures (LINCS) projects organized as proto-matrices, an absolute statistical significance ranked approach rather than a fold change ranked approach, and a novel statistic to evaluate the drug target. Both approaches utilize a random walk running sum statistic to calculate enrichment scores. dpGSEA requires two inputs from the user to run. b dpGSEA is listed with comparable techniques that utilize GSEA-like approaches. Our approach uses the significance of a gene as well as directionality along with the generation of a novel statistic, the target compatibility score. We also show the driver genes for each drug
Fig. 2
Fig. 2
Overview of the dpGSEA pipeline and enrichment approach. Beginning from the left side of the diagram, the two primary inputs of dpGSEA are shown as tables. The top left table lists DEGs from, for example, a disease versus control study. The bottom left table contains the proto-matrix, which is analogous to MSigDB defined gene sets but contains a list of drug-gene actions rather than a gene set. dpGSEA merges the information in these tables by gene and ranks them by the absolute value of their significance. dpGSEA then estimates a running sum statistic based on drug-gene interaction and regulation. Highlighted in yellow are negatively correlated drug-gene interactions (opposing arrows). Enrichment distributions are formed [dotted red line, enrichment score (ES)] determining the maximum deviation of the running sum statistic plot, while the position of the maximum deviation (dotted orange line) represents the target compatibility score (TCS). dpGSEA then permutes the gene locations and generates new enrichment distributions along with null-enriched ES and TCS. The permutations are used to both normalize and generate statistical significance for each score. The output is a list of drugs ranked by their ES or TCS statistical significance (bottom center table). It should be noted that leading-edge genes are also included in the output (not shown)
Fig. 3
Fig. 3
Comparisons between dpGSEA show statistically significant differences between enrichment results. Plots ad show trends and comparisons between dpGSEA (a, c) and GSEA (b, d) for the top 20 and top 50 p value ranked proto-matrices (derived from LINCS data) identifying positively correlated genes. Plots a and b compare the top 20 ranked proto-matrices between dpGSEA and GSEA with each point representing an enriched drug in the final generated list. The labeled blue points all denote paclitaxel perturbated cell lines for the GEPNTs paclitaxel perturbation versus a GEPNTs DMSO control DE. The x-axis represents − log10 of the enrichment score (ES) p value and the y-axis − log10 of the target compatibility score (TCS) p value of corresponding perturbated drug cell line combination. The sub-axis lists the order of ascending significance for ES and TCS, respective of axis, that are also shown in tables E. The tables compare between the ranked orders for both ES and TCS with Wilcoxon signed rank test p values, suggesting the difference between dpGSEA and traditional GSEA results
Fig. 4
Fig. 4
Comparisons between the trends of scores and significance for dpGSEA and the CMAP native and gene2drug approach. Each point within each plot represents a screened drug’s significance and score within an equivalent run for dpGSEA (plots a, b), CMAP native (plot c), and gene2drug (plot d). Screened drugs that pass a designated FDR significance threshold (0.05) are shown in red, and screened drugs highlighted in green show statistically significant findings unique to dpGSEA’s novel TCS. Total number of screened drugs within specific significance thresholds are also shown, and it should be noted that the number passing FDR α = 0.05 using the GSEA-defined FDR threshold (plot a, b) is 121 while those that pass the BH defined threshold are 3 and 0 for CMAP and gene2drug, respectively (plot c, d)

Similar articles

Cited by

References

    1. Dugger SA, Platt A, Goldstein DB. Drug development in the era of precision medicine. Nat Rev Drug Discov. 2018;17(3):183–196. - PMC - PubMed
    1. Pushpakom S, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58. - PubMed
    1. Breckenridge A, Jacob R. Overcoming the legal and regulatory barriers to drug repurposing. Nat Rev Drug Discov. 2019;18(1):1–2. - PubMed
    1. Chen Y, Xu R. Drug repurposing for glioblastoma based on molecular subtypes. J Biomed Inform. 2016;64:131–138. - PMC - PubMed
    1. Keiser MJ, et al. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–181. - PMC - PubMed

LinkOut - more resources