Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 8;15(13):2911-2924.
doi: 10.7150/ijbs.33806. eCollection 2019.

GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion

Affiliations

GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion

Xiujuan Lei et al. Int J Biol Sci. .

Abstract

Circular RNA (circRNA) is a closed-loop structural non-coding RNA molecule which plays a significant role during the gene regulation processes. There are many previous studies shown that circRNAs can be regarded as the sponges of miRNAs. Thus, circRNA is also a key point for disease diagnosing, treating and inferring. However, traditional experimental approaches to verify the associations between the circRNA and disease are time-consuming and money-consuming. There are few computational models to predict potential circRNA-disease associations, which become our motivation to propose a new computational model. In this study, we propose a machine learning based computational model named Gradient Boosting Decision Tree with multiple biological data to predict circRNA-disease associations (GBDTCDA). The known circRNA-disease associations' data are downloaded from cricR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). The feature vector of each circRNA-disease association pair is composed of four parts, which are the statistics information of different biological networks, the graph theory information of different biological networks, circRNA-disease associations' network information and circRNA nucleotide sequence information, respectively. Therefore, we use those feature vectors to train the gradient boosting decision tree regression model. Then, the leave one out cross validation (LOOCV) is adopted to evaluate the performance of our computational model. As for predicting some common diseases related circRNAs, our method GBDTCDA also obtains the better results. The Area under the ROC Curve (AUC) values of Basal cell carcinoma, Non-small cell lung cancer and cervical cancer are 95.8%, 88.3% and 93.5%, respectively. For further illustrating the performance of GBDTCDA, a case study of breast cancer is also supplemented in this study. Thus, our proposed method GBDTCDA is a powerful tool to predict potential circRNA-disease associations based on experimental results and analyses.

Keywords: Gradient Boosting; circRNA-disease associations; machine learning; multiple biological data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interest exists.

Figures

Figure 1
Figure 1
The flowchart of computational model GBDTCDA.
Figure 2
Figure 2
Comparison of the AUC value different methods.
Figure 3
Figure 3
AUC value of Acne, Atherosclerosis and Basal cell carcinoma compared with other methods.
Figure 4
Figure 4
AUC value of Breast cancer, Colorectal cancer and Non-small lung cancer compared with other methods.
Figure 5
Figure 5
AUC value of Cervical cancer, Endometrial cancer and Glioblastoma compared with other methods.
Figure 6
Figure 6
comparison of the precision, recall and f1_measure with different methods.
Figure 7
Figure 7
comparison of the top k ranks with different methods.
Figure 8
Figure 8
AUC value based on the different parameter n_estimators.
Figure 9
Figure 9
AUC value based on different max_depth and min_samples_split.
Figure 10
Figure 10
AUC value based on different parameter min_samples_leaf.
Figure 11
Figure 11
AUC value based on different min_samples_leaf and max_features.
Figure 12
Figure 12
AUC value based on the different parameter subsample.

Similar articles

Cited by

References

    1. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495:333. - PubMed
    1. Qu S, Yang X, Li X, Wang J, Gao Y, Shang R. et al. Circular RNA: A new star of noncoding RNAs. Cancer Lett. 2015;365:141–8. - PubMed
    1. Hsu MT, Coca-Prados M. Electron microscopic evidence for the circular form of RNA in the cytoplasm of eukaryotic cells. Nature. 1979;280:339–40. - PubMed
    1. Kos A, Dijkema R, Arnberg AC, van der Meide PH, Schellekens H. The hepatitis delta (delta) virus possesses a circular RNA. Nature. 1986;323:558–60. - PubMed
    1. Pasman Z, Been MD, Garcia-Blanco MA. Exon circularization in mammalian nuclear extracts. RNA (New York, NY) 1996;2:603–10. - PMC - PubMed

Publication types