Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;27(10):1715-1729.
doi: 10.1101/gr.226589.117. Epub 2017 Sep 1.

Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation

Affiliations

Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation

Joshua Traynelis et al. Genome Res. 2017 Oct.

Abstract

Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10-16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
ExAC v2 MTR plots for the 11 epilepsy genes: (A) CDKL5; (B) GRIN2A; (C) KCNQ2; (D) KCNT1; (E) LGI1; (F) PCHD19; (G) SCN1A; (H) SCN2A; (I) SCN8A; (J) SLC2A1; and (K) STXBP1. Regions in red achieved a study-wide FDR < 0.05 (Supplemental Data S3). MTR = 1 is depicted by the dashed blue line. Multiple gene-specific estimates are also depicted, including a gene's median MTR (black dashed line), 25th percentile MTR (dark green dashed line), and 5th percentile lowest MTR estimates (orange dashed line). The gray dashed line reflects how well that region of the gene was covered in the ExAC v2 sample data by showing the proportion of all ExAC v2 samples that achieved at least 10-fold coverage at the sites relevant to that codon (Methods).
Figure 2.
Figure 2.
The distribution of the 606 qualified pathogenic variants (red circles) among the 11 epilepsy genes. The distribution of the 606 qualified pathogenic variants across genes: (A) CDKL5; (B) SLC2A1; (C) LGI1; (D) STXBP1; (E) GRIN2A; (F) KCNQ2; (G) KCNT1; (H) PCDH19; (I) SCN1A; (J) SCN2A; and (K) SCN8A.
Figure 3.
Figure 3.
ExAC v1 MTR plot with case and control missense variant distributions. The ExAC v1 MTR plots with the case-ascertained qualified pathogenic (red circles) and ExAC v2 Control Group 2 benign (blue circles) missense variants across epilepsy genes SCN1A (A,B) and KCNQ2 (C,D).
Figure 4.
Figure 4.
Boruta feature evaluations: (A) CDKL5; (B) GRIN2A; (C) KCNQ2; (D) KCNT1; (E) LGI1; (F) PCHD19; (G) SCN1A; (H) SCN2A; (I) SCN8A; (J) SLC2A1; and (K) STXBP1. Blue box plots correspond to minimal, average, and maximum Z-score of a shadow feature. Red, yellow, and green box plots represent Z-scores of uninformative, inconclusive, and informative features, respectively. (*) Indicates the “highly informative” features for which the minimum nonoutlier random forest Z-score exceeded the maximum random forest Z-score of the best performing randomized shadow feature (red dashed line).
Figure 5.
Figure 5.
The distribution of the GPP scores from the collection of the six gene-specific logistic models. The tallies of missense variants reported per group reflect the number of missense variants in that group that belong to the six genes for which a multivariate customized logistic model was described in Supplemental Table S5. Control Group 1 and Qualified Pathogenic were the only two groups used to fit the gene-specific models. Control Groups 2 and 3 (presumed enriched for benign) as well as the Unqualified Pathogenic group (presumed enriched for pathogenic variants above that found in population controls) represent missense variants not involved in feature evaluation or model fitting. The Mann-Whitney U tests compare the GPP score distributions from each group to the ExAC v2 Control Group 2 GPP score distribution.
Figure 6.
Figure 6.
Real-time validation of a SCN2A gene-specific model. (A) SCN2A gene distributions of the GPP scores. All Mann-Whitney U tests compare groups to ExAC v2 Control Group 2. Control Groups 1–3 are mutually exclusive presumed benign missense variants. Pathogenic qualified, unqualified, and novel are mutually exclusive presumed pathogenic missense variants. For the bottom two plots of novel variants in Wolff et al. (2017), the “qualified novel” group is a “de novo” and severe pediatric epilepsy subset of the ‘all novel’ group. (B) ROC curves for the model and individual features accurately predicting the 52 novel case and 188 Control Group 2 variants. (CG) Distribution of the model and individual feature scores across all 13,425 possible SCN2A missense variants (gray) with the median SCN2A score depicted by a dashed black line. Also plotted are the 188 ExAC v2 Control Group 2 (blue), the 52 novel variants from Wolff et al. (2017) (red), and the 40 SCN2A unqualified pathogenic test variants (orange).

Similar articles

Cited by

References

    1. Adzhubei I, Jordan DM, Sunyaev SR. 2013. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7: Unit7 20. - PMC - PubMed
    1. Amendola LM, Dorschner MO, Robertson PD, Salama JS, Hart R, Shirts BH, Murray ML, Tokita MJ, Gallego CJ, Kim DS, et al. 2015. Actionable exomic incidental findings in 6503 participants: challenges of variant classification. Genome Res 25: 305–315. - PMC - PubMed
    1. Amendola LM, Jarvik GP, Leo MC, McLaughlin HM, Akkari Y, Amaral MD, Berg JS, Biswas S, Bowling KM, Conlin LK, et al. 2016. Performance of ACMG-AMP variant-interpretation guidelines among nine laboratories in the clinical sequencing exploratory research consortium. Am J Hum Genet 99: 247. - PMC - PubMed
    1. Amr SS, Al Turki SH, Lebo M, Sarmady M, Rehm HL, Abou Tayoun AN. 2017. Using large sequencing data sets to refine intragenic disease regions and prioritize clinical variant interpretation. Genet Med 19: 496–504. - PubMed
    1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc Ser B (Methodol) 57: 289–300.

Publication types