Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 2;17(1):513.
doi: 10.1186/s12885-017-3500-5.

Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization

Affiliations

Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization

Lin Wang et al. BMC Cancer. .

Abstract

Background: Human cancer cell lines are used in research to study the biology of cancer and to test cancer treatments. Recently there are already some large panels of several hundred human cancer cell lines which are characterized with genomic and pharmacological data. The ability to predict drug responses using these pharmacogenomics data can facilitate the development of precision cancer medicines. Although several methods have been developed to address the drug response prediction, there are many challenges in obtaining accurate prediction.

Methods: Based on the fact that similar cell lines and similar drugs exhibit similar drug responses, we adopted a similarity-regularized matrix factorization (SRMF) method to predict anticancer drug responses of cell lines using chemical structures of drugs and baseline gene expression levels in cell lines. Specifically, chemical structural similarity of drugs and gene expression profile similarity of cell lines were considered as regularization terms, which were incorporated to the drug response matrix factorization model.

Results: We first demonstrated the effectiveness of SRMF using a set of simulation data and compared it with two typical similarity-based methods. Furthermore, we applied it to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets, and performance of SRMF exceeds three state-of-the-art methods. We also applied SRMF to estimate the missing drug response values in the GDSC dataset. Even though SRMF does not specifically model mutation information, it could correctly predict drug-cancer gene associations that are consistent with existing data, and identify novel drug-cancer gene associations that are not found in existing data as well. SRMF can also aid in drug repositioning. The newly predicted drug responses of GDSC dataset suggest that mTOR inhibitor rapamycin was sensitive to non-small cell lung cancer (NSCLC), and expression of AK1RC3 and HINT1 may be adjunct markers of cell line sensitivity to rapamycin.

Conclusions: Our analysis showed that the proposed data integration method is able to improve the accuracy of prediction of anticancer drug responses in cell lines, and can identify consistent and novel drug-cancer gene associations compared to existing data as well as aid in drug repositioning.

Keywords: Anticancer drug response prediction; Drug repositioning; Matrix factorization; Precision cancer medicines.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The framework of drug response prediction method SRMF. a The input data for SRMF includes the available drug responses (such as active area values) in cancer cell lines versus the unknown values marked as grey, chemical structure-based drug similarity and gene expression profile-based cell line similarity. b Rationale for the matrix factorization approach. Drugs and cell lines are mapped into a shared latent space with a low dimensionality. Furthermore, the associations among drugs and cell lines are described using the inner products of their coordinates in the shared latent space. c SRMF computes the coordinates of drugs and cell lines U and V in the shared latent space, which are used to reconstruct drug response matrix including the newly predicted drug responses
Fig. 2
Fig. 2
Similar cell lines respond similarly to the similar drugs. a Lower triangular matrix containing Pearson correlation between each pair of gene expression profiles of cell lines. The X-axis and Y-axis represent cell lines classified by their cancer-types (TCGA classification). Box plots show the correlations of gene expression within the same and between different cancer-types. b Lower triangular matrix containing Pearson correlation between each pair of drug sensitivity profiles of cell lines. Box plots show the correlations of drug sensitivity within the same and between different cancer-types. c Box plots show the correlations of sensitivity profiles across cell lines within the same and between different drug clusters. The drugs were hierarchically clustered according to the similarity of their chemical fingerprints. The one-sided Mann–Whitney U test was used to measure the statistical difference between two groups
Fig. 3
Fig. 3
Evaluation of different prediction methods through simulations. We compared the performance of SRMF, KBMF and DLN for the estimation of target drug response. The dimensions of the simulation results are m = 100, n = 150. Details of the simulation methods are in Additional file 3. We varied the noise level, which represents the strength of Gaussian noise adding to the target response matrix, from 0 (no noise) to 0.5 (high noise). a and b represent the performance based on different statistics: drug-averaged PCC and drug-averaged RMSE
Fig. 4
Fig. 4
Box plots of four methods on GDSC dataset with respect to different evaluation metrics. a Pearson correlation coefficient between predicted and observed response values of sensitive and resistant cell lines for each drug. b Root mean squared error between predicted and observed drug responses of sensitive and resistant cell lines for each drug. The t-test was used to measure the statistical difference between two groups.
Fig. 5
Fig. 5
Prediction performance comparisons of four methods for the drugs targeting genes in the PI3K pathway with respect to two measurements. a Pearson correlation coefficient between predicted and observed response values of sensitive and resistant cell lines for each drug. b Root mean squared error between predicted and observed drug responses of sensitive and resistant cell lines for each drug
Fig. 6
Fig. 6
The associations of drug sensitivity and cancer gene mutations were consistent for predicted response data. a and b grouped cell line response values for lapatinib based on their EGFR mutation profiles and ERBB2 mutation profiles, respectively. WT refers to the non-mutated (wide type) cell lines. c grouped cell line response values for PD-0332991 based on their CDKN2A mutation profile
Fig. 7
Fig. 7
The new associations of drug sensitivity and cancer genes were identified based on a combination of newly predicted drug responses and available observations. a grouped cell line response values for PHA-665752 based on their MET amplification profiles. WT refers to the non-mutated (wide type) cell lines. b grouped cell line response values for rapamycin based on their TSC1 mutation profile
Fig. 8
Fig. 8
Repositioning of rapamycin and identification of a novel genomic correlate of rapamycin sensitivity. a grouped cell line response values for PHA-665752 based on their tissue types. NSCLC refers to the non-small cell lung cancer. b The scatter plot displays the association between AK1RC3 expression and newly predicted rapamycin sensitivity. Red circles, NSCLC cell lines; black circles, cell lines from other tumour types

Similar articles

Cited by

References

    1. Mirnezami R, Nicholson J, Darzi A. Preparing for precision medicine. N Engl J Med. 2012;366:489–491. doi: 10.1056/NEJMp1114866. - DOI - PubMed
    1. Xiao G, Ma S, Minna J, Xie Y. Adaptive prediction model in prospective molecular signature-based clinical studies. Clin Cancer Res. 2014;20:531–539. doi: 10.1158/1078-0432.CCR-13-2127. - DOI - PMC - PubMed
    1. Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–575. doi: 10.1038/nature11005. - DOI - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. - DOI - PMC - PubMed
    1. Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, et al. A landscape of Pharmacogenomic interactions in cancer. Cell. 2016;166:740–754. doi: 10.1016/j.cell.2016.06.017. - DOI - PMC - PubMed

Substances