Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2012 Oct;131(10):1639-54.
doi: 10.1007/s00439-012-1194-y. Epub 2012 Jul 3.

Risk estimation and risk prediction using machine-learning methods

Affiliations
Review

Risk estimation and risk prediction using machine-learning methods

Jochen Kruppa et al. Hum Genet. 2012 Oct.

Abstract

After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Path to construct, evaluate and validate a rule of classification or probability estimation
Fig. 2
Fig. 2
Flowchart of the systematic literature search
Fig. 3
Fig. 3
a ROC curves for all methods in selected SNP sets in the test data. b ROC curves for Random Jungle in regression mode in all SNP sets in the test data
Fig. 4
Fig. 4
Brier scores for scores based on lasso or Random Jungle regression in the test data

Similar articles

Cited by

References

    1. Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, Lee AT, Plenge RM, Kastner DL, Gregersen PK. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc. 2009;3:S2. doi: 10.1186/1753-6561-3-s7-s2. - DOI - PMC - PubMed
    1. Anderson J. Separate sample logistic discrimination. Biometrika. 1972;59:19–35. doi: 10.1093/biomet/59.1.19. - DOI
    1. Arminger G, Enache D. Statistical models and artificial neural networks. In: Bock H, Polasek W, editors. Data analysis and information systems. Heidelberg: Springer; 1996. pp. 243–260.
    1. Arshadi N, Chang B, Kustra R. Predictive modeling in case–control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset. BMC Proc. 2009;3(Suppl 7):S60. doi: 10.1186/1753-6561-3-s7-s60. - DOI - PMC - PubMed
    1. Banerjee M, Ding Y, Noone A (2012) Identifying representative trees from ensembles. Stat Med 31:1601–1616. doi:10.1002/sim.4492 4 - PubMed

LinkOut - more resources