Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov 21;35(11):1929-1949.
doi: 10.1021/acs.chemrestox.2c00245. Epub 2022 Oct 27.

Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities

Affiliations
Review

Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities

Imran Shah et al. Chem Res Toxicol. .

Abstract

Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests: The authors declare they have no actual or potential competing financial interests

Conflict of Interest

The authors declare no conflict of interest. This manuscript has been reviewed by the Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Figures

Figure 1.
Figure 1.
Overview of connectivity mapping as a pattern matching between (a) query and (b) reference using a (c) similarity measure illustrating different types of matches (f, g, and h). (a) The query is a directional gene signature (DS={S+,S}) signified by a set of up-regulated genes (S+ shown as red circles) and a set of down-regulated genes (S shown as blue circles). (b) The reference is a transcriptomic profile x shown as a vector of log2 transformed fold-change (L2FC) values for each gene (blue and red colors represent down- and up-regulation, respectively). (c) The similarity measure (SM) for scoring the match between DS and x. (d) A collection of predefined signatures representing sets of genes (e.g., involved in pathways). (e) A collection of transcriptomic profiles for a set of perturbagens. (f) “Positive connection” between DS and x when S+ and S are correlated with up- and down-regulated genes in x. A positive connection is a match found when SMDS,x>0. (g) “No connection” between DS and x when S+ and S are uncorrelated with up- and down-regulated genes in x (where SMDS,x0). (h) “Negative connection” between DS and x when S+ and S are anti-correlated with up- and down-regulated genes in x. A negative connection is a match found when SMDS,x<0.
Figure 2.
Figure 2.
Representing transcriptomic profiles and gene signatures. (a) A transcriptomic profile x shown as a vector of log2 transformed fold-change (L2FC) values for each gene (blue and red colors represent down- and up-regulation, respectively). (b) An extreme transcriptomic profile xn is defined by selecting the n most up- and down-regulated genes in x (shown as red and blue squares, respectively). (c) A directional signature (DSn) is defined by transforming all up- and down-regulated genes in xn to 1 and −1, respectively. The directional signature (DSn={S+,S}), is signified by a set of up-regulated genes (S+ shown as red circles) and a set of down-regulated genes (S shown as blue circles). (d) A non-directional signature (Sn) is derived from DSn by ignoring the direction of expression changes for all genes (all genes are shown as black circles). (e) A pathway containing a collection of proteins can be represented by a set of genes (which encode the proteins) and defined as a non-directional signature (S). (f) A causal network comprised of interacting proteins can be represented simply by a collection of genes (which can be represented as S. (g) A transcriptomic database is a collection of x. (h) A bioactivity signature database is a collection formed by one or more of the following types of signatures: xn,DS,DSn,S, and Sn.
Figure 3.
Figure 3.
Overview of connectivity mapping as a pattern matching between (a) transcriptomic profile query and (b) non-directional signature reference using a (c) similarity measure showing illustrative examples of matches (f, g, h, and i). (a) The query is a transcriptomic profile x shown as a vector of log2 transformed fold-change (L2FC) values for each gene (blue and red colors represent down- and up-regulation, respectively). (b) The reference is a non-directional gene signature (S) signified by a set of genes (shown as black circles). (c) The similarity measure (SM) for scoring the match between x and S. (d) A collection of transcriptomic profiles for a set of perturbagens. (e) A collection of predefined signatures representing sets of genes (e.g., involved in pathways). (f) “Connection” between x and S is when most up- and down-regulated genes in x match S (observed when |SMx,S|>0). (g) A “positive connection” between x and S is when mostly up-regulated genes in x are present in S (observed when SMx,S>0). (h) “No connection” between x and S is when genes in S are randomly distributed across x (where SMx,S0). (i) A “negative connection” between x and S is when mostly down-regulated genes in x are present in S (observed when SMx,S<0).
Figure 4.
Figure 4.
Overview of connectivity mapping for estimating chemical concentration-dependent scores for a signature. (a) The query is a non-directional gene signature (S) signified by a set of genes (shown as black circles). (b) The reference is a transcriptomic profile x shown as a vector of log2 transformed fold-change (L2FC) values for each gene (blue and red colors represent down- and up-regulation, respectively). (c) The similarity measure (SM) for scoring the match between S and x. (d) A transcriptomic database (Rx) comprised of a collection of x for multiple chemicals and concentrations (Conc). Rx is visualized as a matrix in which the rows represent genes, the columns show chemical concentrations, and the values in each column are x. For example, the outlined box in the matrix signifies eight x for each of the concentrations of a chemical. (e) Concentration-response analysis of similarity scores between S and x(SMS,x) for each x of chemical shown in (d). The ordinate and abscissa show the similarity scores and the concentrations of the chemical, respectively. A null distribution (Null Dist.) of similarity scores (shown on the right of the graph along the ordinate axis) is generated by permuting Rx and calculating SMS,x for all random profiles. The standardized similarity scores (Z) (calculated using the null distribution and shown as “+” symbols) are analyzed by curve-fitting. The fitted concentration-response curve (blue) is used to estimate the benchmark concentration (BMC) corresponding to the benchmark response (BMR) value of Z=1.

Similar articles

Cited by

References

    1. Harrill J; Shah I; Setzer RW; Haggard D; Auerbach S; Judson R; Thomas RS Considerations for Strategic Use of High-Throughput Transcriptomics Chemical Screening Data in Regulatory Decisions. Curr. Opin. Toxicol 2019, 15. 10.1016/j.cotox.2019.05.004. - DOI - PMC - PubMed
    1. McKenna NJ; O’Malley BW Combinatorial Control of Gene Expression by Nuclear Receptors and Coregulators. Cell 2002, 108 (4), 465–474. 10.1016/S0092-8674(02)00641-4. - DOI - PubMed
    1. Simmons SO; Fan C-Y; Ramabhadran R Cellular Stress Response Pathway System as a Sentinel Ensemble in Toxicological Screening. Toxicol. Sci 2009, 111 (2), 202–225. 10.1093/toxsci/kfp140. - DOI - PubMed
    1. Lamb J; Crawford ED; Peck D; Modell JW; Blat IC; Wrobel MJ; Lerner J; Brunet J-P; Subramanian A; Ross KN; Reich M; Hieronymus H; Wei G; Armstrong SA; Haggarty SJ; Clemons PA; Wei R; Carr SA; Lander ES; Golub TR The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 2006, 313 (5795), 1929–1935. 10.1126/science.1132939. - DOI - PubMed
    1. Lamb J The Connectivity Map: A New Tool for Biomedical Research. Nat. Rev. Cancer 2007, 7 (1), 54–60. 10.1038/nrc2044. - DOI - PubMed

Publication types