Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 3:8:15932.
doi: 10.1038/ncomms15932.

A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury

Affiliations

A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury

Pekka Kohonen et al. Nat Commun. .

Abstract

Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a 'big data compacting and data fusion'-concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a 'predictive toxicogenomics space' (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 108 data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy.

PubMed Disclaimer

Conflict of interest statement

J. Parkinen, P. Kohonen, S. Kaski, R.C. Grafström declare ‘personal financial interest’ for being equal contributors to a patent application. All other authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Generating the Predictive Toxicogenomics Space (PTGS) concept.
(a) The probabilistic component modelling leading to the PTGS scoring concept utilized latent Dirichlet allocation. This unsupervised method uncovers common themes that describe collections of profiles, seeking associations between compound treatments (‘instances’) and differential expression of gene sets, leading to data reduction and discovery of components that can be used to quantitatively classify new gene expression profiles. (b) Probabilistic modelling of transcriptomics and cytotoxicity data from exposed cells was used to identify specific component models representing mechanistic aspects of the responses and genes activated by the treatments. Scores derived either from the models or the gene set encapsulated by the PTGS serve to predict a variety of types of dose-dependent cytotoxicity effects; the analysis steps are presented in detail in Supplementary Fig. 1. Validation of the PTGS scoring concept encompassed: bioinformatics-driven assessment of the component-associated genes relative to genes known as cytotoxicity-related, generation of cellular cytotoxicity screening data for comparison of the omics-based PTGS relative to quantitative structure-activity relationships (QSAR) analysis, and finally, assessment of the in vitro to in vivo extrapolation applicability of the PTGS in two ways against the Open 'Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system' (TG-GATEs), that is, for prediction of histopathology of rats subjected to repeat dose-toxicity studies, and for prediction of human drug-induced liver injury from human and rat hepatocytes. Numbers of compounds assessed within each omics data set used to establish the PTGS and to validate the concept are indicated.
Figure 2
Figure 2. Generating the PTGS and establishing the cytotoxicity-scoring concept.
(a) Selecting the number of probabilistic components to retrieve as many biologically significant associations with as few components as possible. (b) Selecting an optimal size of the PTGS based on cytotoxicity-predictive performance relative to the NCI-60 data. (c) The 14 PTGS components (labelled) ranked based on their probability-weighted mean concentration-dependent cytotoxicity values (that is, log10CMap–log10GI50 concentration) versus the number of associated genes. (d) Correlation of the number of differentially expressed genes with the concentration-dependent cytotoxicity. Colour and size indicate amount of transcriptional variation explained by the PTGS that is, the component-based score (n=492). (e,f) Instances with a small number of differentially expressed genes tend to have cytotoxicity below the TGI level (blue oval), whereas (g) compounds profiled at cell-killing doses (>LC50) show greater differences (green circle). (h) Analysis of component-based PTGS scores versus concentration-dependent cytotoxicity was used to determine (i) a cut-off, plotted here against the proportion of instances above the GI50-level. Dashed red line indicates the threshold at the GI50-level and the dashed black line the cut-off at 0.12 when ∼50% of CMap instances are above GI50. (j) The gene-based scoring, based on the proportion of active PTGS-related genes, was evaluated similarly. (k) The cut-off was set at 25% (cf. Supplementary Figs 4 and 5, for data see Supplementary Data 1).
Figure 3
Figure 3. Validation of the PTGS using gene set enrichment analysis.
(a) ‘Eye diagram’ showing the associations between the genes associated with the 14 PTGS components (middle, colour) and the top 5 CMap instances (left) and overrepresented toxicological functions (right). Line widths indicate association strengths. The components have been sorted according to similarity, as shown in Supplementary Fig. 3b; data in Supplementary Data 4. (b) Biological and toxicological complexity of the PTGS components defined as the proportion of results (above a set statistical threshold) in each analysis category ascribed to the component gene set. Numbers above bars denote the numbers of genes in each component. Details of the data are found in Supplementary Data 3–7. (c) Frequency plot of the upstream regulator enrichments for the PTGS components depicting multiple transcriptional regulators associated with stress responses, inflammation and with cell division. For data and further related analyses, see Supplementary Fig. 6b and Supplementary Data 5.
Figure 4
Figure 4. High-throughput screening cell-based validation of PTGS to predict cytotoxicity in the CMap database.
(a) Cell survival measured in the three CMap cell lines at different concentration levels for 38 non-NCI-60 CMap compounds. (b) Concentration-dependent cytotoxicity values of 16 compounds (36 instances) indicated data agreement between the NCI-60-based test and the chosen cytotoxicity assay (ATP content) (Pearson correlation 0.86). As shown, the classification of toxic versus non-toxic repeated in 32 of 36 instances, and the four instances where this changed had a score close to the cut-off in both data sets. (c) Proportions of CMap, CMap/NCI-60 crossover and validation (test) instances predicted by the PTGS to have been measured above the GI50-level show a balance of toxic and non-toxic treatments (numbers tested shown). About 25% of 3062 CMap profiles are predicted to be above the GI50 levels. (d) ROC curves indicating the cytotoxicity-predictive performance of the gene-based, component-based and the Partial Least Squares QSAR methods. The AUC values were 0.92 (n=80), 0.91 (n=91) and 0.64 (n=85), respectively. Further details of the QSAR analysis are in Supplementary Fig. 7. For screening data see Supplementary Data 9.
Figure 5
Figure 5. Validation of the PTGS using in vitro and in vivo profiles from the TG-GATEs toxicogenomics database.
The increase with dose in the proportion of treatments exceeding the virtual GI50-level (dashed line) in human hepatocytes measured at (a) 8 h (n=388) and (b) 24 h (n=394) and in (c) human (n=388) or (d) rat (n=419) hepatocytes measured at 8 h, using either the component-based (a,b) or the gene-based (c,d) methods (Supplementary Data 10,11). The PTGS DILI score (for analyses see Supplementary Data 12–15, Supplementary Fig. 8), defined as the score given by the most sensitive component from among G, H, I and N, (e,f) predicts the severity grade (denoted by colour) and covers 17 different types of histopathological changes observed in repeated dose treatments of rats for up to 28 days. (g) Separation between positive and negative classes increases with the severity of histopathological changes from present to severe; n=463, 448, 282, 116 and 30 of 1689 total. (h) Defining a threshold for the score above which more than 50% of the observations have histopathological changes present (dashed line). (i) The ability of PTGS to predict clinical exposure levels raising DILI concerns was tested and compared to other in vitro assays. Numbers of matching compounds with rat hepatocyte data are indicated inside red bars. PTGS, by itself, outperforms the other approaches, and in combination with other hepatocellular-based assays achieved a positive predictive ability of 72–86% without a loss of specificity (further details in Supplementary Fig. 9 and Supplementary Data 16–18).

Similar articles

Cited by

References

    1. Collins F. S., Gray G. M. & Bucher J. R. Toxicology. Transforming environmental health protection. Science 319, 906–907 (2008). - PMC - PubMed
    1. Hamburg M. A. Advancing regulatory science. Science 331, 987 (2011). - PubMed
    1. Willyard C. Foretelling toxicity: FDA researchers work to predict risk of liver injury from drugs. Nat. Med. 22, 450–451 (2016). - PubMed
    1. Olson H. et al.. Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul. Toxicol. Pharmacol. 32, 56–67 (2000). - PubMed
    1. Hussaini S. H. & Farrington E. A. Idiosyncratic drug-induced liver injury: an update on the 2007 overview. Expert Opin. Drug Saf. 13, 67–81 (2014). - PubMed

Publication types