Beyond initial discovery of a pathogenic variant, establishing that a variant is recurrently associated with disease is important for understanding clinical impact and disease etiology. Disappointingly, our ability to characterize pathogenicity under varied circumstances is limited. Here we discuss the role of genetic and environmental background and how it affects variant penetrance and outcomes. Specifically, genetic and environmental settings determine penetrance, and we should expect lower penetrance where contexts are diverse. For example, when over 5000 ClinVar pathogenic and loss-of-function variants were assessed in two large biobanks, UK Biobank and BioMe, the mean penetrance was only 7%. This indicates that the participants in the family-based, clinical, and case-control studies that identified these variants were more homogenous and enriched for etiologic co-factors, and the winner’s curse was at play. We also emphasize that the outcome of interest can vary across conditions. The variant that causes hemoglobin S can increase the risk of death from sickling, lower the risk of death from malaria, or increase the risk of kidney disease, depending on the presence of other variants, the endemicity of malaria, and a suite of other factors. Overall, annotation on a single continuum from benign to pathogenic attempts to shoehorn a complex phenomenon into an overly simplistic framework. Variant effects often vary by context, and thus it is critical to assess potential pathogenicity in different settings. There is no panacea or easy fix, but we offer two recommendations for consideration. First, we need to routinely evaluate contexts such as sex and genetic ancestry by conducting stratified analyses and developing methods that can detect heterogenous effects (e.g. female-to-male allele proportion ratios). Second, we need to consistently document what we know about effect modifiers in our annotation databases. These are not the only possible approaches, but they begin to provide means to create robust annotations of pathogenicity.
When we talk about the pathogenicity of genetic variants, what exactly are we talking about? Although this question on its surface may appear to be a trivial or simply philosophical question, it is not. It shapes the foundational logic of human genetics research and determines the utility of our work with respect to disease risk and clinical intervention. In brief, our standard definitions of pathogenicity refer to variants that are deleterious, harmful, or increase the probability of disease1. This sounds simple, but it is too simple, as this definition often leads us to ignore a key principle: Genes evolve and function in the contexts created by their environment, including other genetic variants. These contexts can determine penetrance and thus the ability of a variant to cause disease.
Variant pathogenicity often depends on context
A simple but informative example of the heterogeneity of pathogenicity is the beta globin variant that causes hemoglobin S (HbS). The HbS allele in an individual who is homozygous for this variant has sickle cell disease, thereby increasing risk of death at a young age2,3. However, this same allele in the context of a second allele that encodes HbA will reduce the risk of risk of death at a young age in malaria endemic regions4,5. This decreased risk of death is the reason that the HbS allele is common in malaria endemic regions, and has not been culled by evolution6,7. Furthermore, in regions without malaria, being heterozygous for the HbS allele may not affect risk of death at a young age, unless there exists another precipitating variant in that individual’s genome, or the carrier experiences hypoxia when exercising at high altitude8,9,10. In these distinct, yet malaria-free, contexts, the HbS variant may again increase death risk at a young age. To add to this complexity, data now indicate that older heterozygotes may have an increased risk of subclinical kidney pathology, and increased rates of acute renal failure when exposed to Sars-CoV-211. Finally, variants that decrease the expression of alpha globin subunits (HBA1 and HBA2—alpha thalassemia)12,13 or allow for the persistent expression of gamma globin subunits into adulthood (HBG1 and HBG2 – persistence of fetal hemoglobin)14 can greatly mitigate the risk of death due to HbS homozygosity. Thus, the pathogenicity of the HbS variant depends heavily on other alleles, the environment, and the health outcome being evaluated. HbS can be considered a “simple” case, but even in this situation, pathogenic potential is strongly shaped by multiple contextual factors (Fig. 1).
This example clarifies that the process of making a universal pathogenicity assessment, uses an oversimplistic framework to describe an inherently complex phenomenon. Even when a variant can cause disease, it often does not, and knowing the modifying factors is critical to evaluating pathogenicity. Thus assuming that genetic variants have a single unidirectional effect on one outcome, obscures the complex genetic architecture of disease15. Regulatory processes, genetic buffering, environmental interactions, and epistasis can all play roles in determining the impact of a given variant16,17,18,19, and these contexts cannot be ignored if we want to understand variant pathogenicity15.
Defining pathogenicity is especially hard for variants with low penetrance and variable expressivity
Nonetheless, attempts are still made to produce “universal pathogenicity” assessments20. These assessments may make sense in the context of highly penetrant variants that cause Mendelian disease, but what about low penetrance variants with variable expressivity? Allelic expression levels, epigenetic changes, cis variants, trans variants, environmental exposures, and other factors, including lifestyle, collectively shape variant impact21,22 and low penetrance variants make up a very large proportion of our annotations. When over 5000 pathogenic and loss-of-function variants were assessed in the UK Biobank and BioMe, the mean penetrance was unexpectedly low (6.9%, 95% CI: 6.0–7.8%)23. While some of this pattern can be partly explained by the factors that drive the winner’s curse (i.e. inflated magnitude of initial associations due to low power, publication bias, model overfitting, etc.)24,25, it must be added that smaller associations should be expected when the study participants are more diverse. Family-based, clinical, and case-control studies have more homogenous participants and because study entry is partly conditioned on disease status, these study groups are enriched for etiologic co-factors. This means lower penetrance and smaller effect sizes will often be observed in large population-based cohorts22,26,27, even when there are subgroups where penetrance is high. When a variant has a smaller effect size and reduced penetrance in a heterogenous, population-based sample, it is important to examine that variant in multiple contexts. This can identify potentially sensitive subgroups, such as an ancestries, environments, or multiplexed families with higher penetrance and pathogenicity. Overall, assessment of variants in multiple contexts28,29 is critical to understanding differences in the causal mechanisms of disease in distinct groups.
Downplaying this heterogeneity impairs clinical communication and practice
Regardless of the reason for low penetrance, it creates a problem for pathogenicity assessments and clinical genetic practice. When these annotations are used as screening tests for disease risk, there is a systematic problem with test specificity (i.e., the ability of a test to identify true negatives and avoid false positives30). Since penetrance among many pathogenic variants is often low, most people with these variants will not develop disease. Thus, when applied clinically this can result in a very large number of false positives and subsequent unnecessary actions. While a strong argument can be made for tolerating false positives (type 1 error) in the early stages of genetic discovery research31,32, false positives in clinical settings can lead to patient anxiety, needless expense, and harm33.
One way to vet putative pathogenicity is to perform experiments that biologically validate the effects of genetic variants. However, it should be noted that such experiments are limited in their generalizability, and they are restricted by the conditions under which the experiments are performed. In vitro experiments and animal models can clearly demonstrate causal and mechanistic evidence of pathogenicity, but they cannot test or create all relevant contexts. For example, the experimental temperature, day night cycle, diet, air quality, or hormonal milieu may not reflect those of the humans that carry a potentially pathogenic variant. Geneticists are aware of these dynamics, known as reaction norms, and they have been taught in genetics classes for decades34,35. However some physicians and the general public may not be as familiar with how this fundamental principle of genetic variation can affect our annotations.
Universal pathogenicity assessments also create a systematic problem with sensitivity (i.e., the ability of a test to identify true positives and avoid false negatives30). This is partly because our annotation guidelines36, even when thoughtfully refined37 have traditionally considered the “absence of evidence” to be “evidence of absence”. In other words, when a variant is observed in a high number of healthy people (e.g., minor allele frequency [MAF] >5%) and it has not been yet linked to disease, then it can be labeled benign. Unfortunately, this approach fails to account for the determinants of penetrance. If a key determinant of penetrance was not present among the observations, then a conditionally pathogenic variant can be labeled a Variant of Unknown Significance or even Benign. This creates many issues but it seems particularly troublesome in the clinic when sequencing patients to identify the cause of rare syndromes38. Imagine trying to annotate the phenylalanine hydroxylase gene variants that cause phenylketonuria39 in a population with almost no access to foods that contain phenylalanine. Phenylalanine hydroxylase variants would appear benign in this context. Hence, in most cases when variant pathogenicity is assessed, the process identifies what can cause disease, but importantly, it does not identify what will cause disease in a given person at a given time40,41. This context agnostic approach has utility, but its limitations must be acknowledged and accounted for.
Existing genomic methods improve when context is considered
Despite the drawbacks of often defining pathogenicity as a binary and immutable feature of variants, genetic researchers have created many techniques of great utility. For example, molecular algorithms have been developed that can predict loss of protein function and these have high value in many settings42,43,44. We also now have protocols for molecular and clinical validation with laboratory-based functional assays45, and the longitudinal tracking of sequenced individuals in electronic health records46. Furthermore, several key papers have improved our thinking about the necessity of using diverse convergent evidence for causal reasoning in genomics31,47,48. Perhaps the most impressive advance in this area, is the scoring system developed by ClinGen that assembles and interprets empirical evidence for pathogenicity49. However, these approaches can only do so much when context is not explicitly considered. For example, even if we could develop a prediction algorithm that perfectly determined loss-of-function in any protein, we would still not know if loss-of-function was good or bad for any individual (given the remainder of their genome, and their environment, and the phenotype in question)50,51,52,53,54,55,56. Take for instance a protein that can convert pro-carcinogenic compounds to carcinogens. Loss-of-function of this protein may be beneficial in the context of high procarcinogen exposure57. Hence, the context, in this case the environment, can change a variant from beneficial to pathogenic and vice versa.
Therefore, even if we are using the best methods, we can observe conflicting evidence of pathogenicity when we do not explicitly consider context. This is particularly relevant for common variants. If a given variant is detrimental in all contexts, then this variant will usually be observed as a rare or de novo variant. In other words, variants are persistently culled by evolution when they reduce reproductive fitness in all contexts, but they can be maintained in the contexts where they do not reduce reproductive fitness. This may be especially evident when we consider pleiotropy, because antagonistic pleiotropy appears to play a major role in the persistence of several human disease variants58,59. For example, the strongest genetic determinant of Alzheimer’s Disease, APOE460,61, also prevents death from diarrhea in childhood62,63. Our ancestors probably needed infection protection for their reproductive fitness and one of the variants that met this early life requirement, also increased the risk of a late life disease, Alzheimer’s Disease62,63,64,65,66. Thus, it makes very little sense to talk about the universal pathogenicity of any common variant. However, from a practical perspective, it is hard to do anything else.
Context is complex—how can we specify it?
Context is easy to invoke as a concept, but the relevant context or determinants of penetrance, can differ for virtually every variant. Thus, when operationalizing research questions: What contexts do we measure? What contexts do we analyze? What phenotype do we examine? Even in the simplest research case with a single SNP, the potentially relevant context can be a cryptic and computationally impractical search space. Unfortunately, this explodes into intractability when considering Genome Wide Association or Next Generation Sequencing data (millions of SNPs and potentially thousands of environmental exposome variables). So, how can this problem be addressed? How can contexts that need attention be identified? It may be most practical to start with common and easily measured “contexts” that are known to have strong biological functions. This will help to optimize precision, statistical power, and the likelihood of documenting context-dependent pathogenicity.
With these features in mind, biological sex is among the easiest contexts to evaluate. It is easily measurable, it divides all human populations approximately in half, and there are many anatomic, physiologic, and pathophysiologic distinctions that align with it. Thus we can, and probably should, run sex-stratified sensitivity analyses in most genetic research studies67,68,69 especially when a trait is sexually dimorphic70. Failure to do this can obscure important biological patterns. Another step would be to encourage new methods for probing the X-chromosome, a chromosome that is often-ignored in association analyses. We have already started this strategy by analyzing the female-to-male allele frequency ratio as tool for the discovery of pathogenic variants (Equation 1)71. The reasoning is as follows: females have 2 copies of all Non-Pseudoautosomal X-chromosome loci and males only have one. Thus, females can be biologically more resilient to the presence of harmful variants at these sites. The exception is variants with dominant effects, in which case ratios will not be useful for detecting these variants. In any dataset of adult humans, when a Non-Pseudoautosomal X-chromosome variant exists at a higher proportion in females, this pattern can serve as evidence that the variant may increase the probability of premature death.
Following this simple logic, we used gnomAD data72 to characterize this phenomenon. Our methods are fully described in71, but in short, we obtained exome data from the X-Chromosomes of 76,702 males and 64,754 females. Then, we calculated female-to-male allele frequency ratios for the 44,606 variants that had an allele count of at least 5. None of the pseudoautosomal variants had a ratio above 11, but 319 of the non-pseudoautosomal variants had ratios above this empiric threshold.
Only 25 of these high-ratio variants were annotated in ClinVAR and had a rs number. Most of these variants had high sex-averaged MAFs and no known associations with disease, and they were listed as benign or likely benign (Table 1). As an example, one of the 25 variants had a sex-averaged MAF of 0.13, no known disease associations, and was listed as likely benign. This site had been genotyped 38,527 times in males (one locus each) and 104,056 times in females (2 loci each), so there was no shortage of data. Overall, the variant was observed a total of 18,736 times, but not one of these observations came from a male or a homozygous female. It was only found in heterozygous females. Thus, it is likely that this variant is almost 100% lethal (perhaps even embryonic lethal) in males and homozygous females, but is without large effect in heterozygous females. When we considered the other 24 variants, we found similar patterns, although the comparisons were less extreme.
To further characterize these variants, we probed them with a diverse set of web-based bioinformatic resources: dbSNP73, VarSome74, OMIM75, and VENUS76,77. These databases provide additional information on evolutionary conservation, gene-phenotype relationships, protein-structure predictions, and other aspects of these variants that need consideration in pathogenicity assessments. We found that:
-
1.
Existing annotation methods can miss sex-specific pathogenicity. We observed that 22 out of 25 (88%) high ratio variants are listed as Benign or Likely Benign in ClinVar (1 is listed as Conflicting [Uncertain Significance and Benign] 2 are listed as Uncertain Significance). These variants are commonly observed in healthy heterozygous females and they achieve high sex-averaged MAFs so they appear benign, but males are rarely observed (i.e., these variants are not often tolerated in males)
-
2.
QC procedures can mislabel evidence of sex-specific pathogenicity as genotyping error. We looked in the second dataset from gnomAD site (the genomes data) and observed that 22 out of the 25 (88%) high ratio variants failed QC filters74. Sex differences in MAF were assumed to be error rather than putative evidence of sex-specific pathogenicity. Thus, these QC filters may systematically remove variants with sex-specific pathogenicity before they can even be assessed.
-
3.
Our ratio method identified genes that were already linked to clinical syndromes through other variants. In all, 23 of 25 (92%) genes implicated by the high ratio variants have specific links to clinical syndromes listed in OMIM75. The other two genes have tentative links to pathology described in their OMIM entry.
-
4.
Structural predictions are not available or useful for most of these top ratio hits. Michaelangelo-VENUS structural predictions76,77 were only possible for 6 of the 25 variants (24%). VENUS requires the specification of a specific amino acid substitution at a specific site in the protein. This makes sense for some variants, but 19 of the 25 variants do not have that impact, or their exact impact on amino acid sequence cannot be yet specified (synonymous, intronic, splice donor variants, etc.)
-
5.
Additional heterogeneity exists and some high ratio variants might be better tolerated by males and homozygous females in specific contexts. Some high ratio alleles had frequencies that differed by ancestry group, and this is consistent with the interpretation that these variants may not have sex-specific pathogenicity in all contexts.
Overall, these 5 points indicate that seeking and documenting evidence of sex-specific effects could improve pathogenicity annotations. The existing tools for variant characterization can only do so much if context is not explicitly evaluated. Finally, we note that the many potential mechanisms for sex-specific pathogenicity remain to be characterized, but there is some indication in our initial results that regulatory function may sometimes be involved. RegulomeDB evaluations of the 25 high-ratio variants provide diverse and nuanced information on the likelihood of regulatory function at these loci (Table 2). They reveal that 13 of the 25 high ratio variants (52%) have some indication of regulatory function: a rank less than three or a score greater than 0.5. A rank less than three indicates the presence of at least two strong pieces of experimental evidence that are consistent with regulatory function, and scores greater than 0.5 are in the top half of possible scores from models that predict transcription factor binding.
Sex differences in allele frequency on the X chromosome are a special case, but this pattern may also be found in autosomal variants that affect disease risk differently between males and females. Very large and very small allele proportion ratios in the autosomes may also be indicative of sex-specific effects that deserve further investigation. While this area of genetic research is still in its infancy, and thresholds for discovery and confirmatory findings are not yet established, we have already observed extreme female-to-male allele proportion ratios on autosomes (many standard deviations above or below the mean). Work in progress has already revealed a distribution of ratios on chromosome 21 that demonstrates this point (Table 3). Ratios this high are very unlikely occur by chance. Finally, we note that biological sex is just the first and simplest context to consider. More complex situations such as ancestry and environmental exposures will need increased attention. For example, we already know that failing to assess ancestry-specific associations can generate ancestry-specific misinterpretations of genetic tests that disproportionally harm marginalized groups78. We need to collect genetic data on diverse ancestry groups79 and explicitly consider this context in order to avoid generating health disparities with ancestry-specific medical error80.
Overall, considering context will not solve all the problems in pathogenicity assessment, but it is a necessary step for addressing key clinical and translational issues in genetics. Sex-stratified GWAS70, and female-to-male allele proportion ratios71 can start us on a path that probes multiple determinants of penetrance. A lot of work remains in determining how to best explore contextual frameworks for variant pathogenicity, and other tools will be needed to evaluate additional factors, such as xenobiotic exposures and ancestry. However, biological sex is an ideal context to start with, because it will not require any new data. Information on biological sex is extractable from virtually all existing genomic data, and these data can be easily re-evaluated at low cost. Furthermore, it will not be hard or expensive to better evaluate sex differentials in allele frequency and improve the definition of benign in pathogenicity annotations. As an easy first step, ClinVar could present MAFs by sex. Overall, we call on the genetic research community to proactively consider context. While the optimal frameworks for achieving this goal are not fully established, we can to start by routinely evaluating the sexes separately, and documenting what is known about effect modifiers in our annotations. We have proposed a deeper dive into sex as a common effect modifier but other strata should be explored and documented in annotations. Covariates should be collected in our datasets and exploratory sensitivity analyses should be more routine or we will fail to identify many determinants of penetrance that have clinical relevance.
Conclusion
In summary, these strategies will not provide better answers to the old questions; they simply refine the questions so that they are more relevant. The old questions are generally context agnostic, and they have set the basis of our understanding reasonably well, but not well enough. If we want to keep advancing, we must now address the ubiquity of pleiotropy and the contextual determinants of penetrance.
Equation 1. The female-to-male allele proportion ratio71
R: allele proportion ratio
Vf: the minor allele count in females
Af: the total allele count in females
Vm: the minor allele count in males
Am: the total allele count in males
Data availability
All data are public and available from https://gnomad.broadinstitute.org/.
Code availability
The data handling and ratio calculations are previously described71. We re-evaluated the hits in publicly available web databases, but there is no new code associated with this commentary.
References
Pathogenic variant. NCI Dictionary of Genetics Terms, https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/pathogenic-variant.
Bender, M. A. & Carlberg, K. Sickle cell disease. In: GeneReviews(®) (eds. Adam, M. P. et al.) (University of Washington, Seattle, 1993).
Ranque, B. et al. Estimating the risk of child mortality attributable to sickle cell anaemia in sub-Saharan Africa: a retrospective, multicentre, case-control study. Lancet Haematol. 9, e208–e216 (2022).
Depetris-Chauvin, E. & Weil, D. N. Malaria and early african development: evidence from the sickle cell trait. Econ. J. (London) 128, 1207–1234 (2018).
Gong, L., Parikh, S., Rosenthal, P. J. & Greenhouse, B. Biochemical and immunological mechanisms by which sickle cell trait protects against malaria. Malar J. 12, 317 (2013).
ALLISON, A. C. Protection afforded by sickle-cell trait against subtertian malareal infection. Br. Med. J. 1, 290–294 (1954).
Haldane, J. Disease and evolution. Ric. Sci. 19, 68–76 (1949).
Ashorobi, D., Ramsey, A., Yarrarapu, S. N. S. & Bhatt, R. Sickle cell trait. In StatPearls (StatPearls Publishing, 2022).
Kotila, T. R. Sickle cell trait: a benign state? Acta Haematol. 136, 147–151 (2016).
O’Connor, F. G. et al. Summit on exercise collapse associated with sickle cell trait: finding the ‘way ahead. Curr. Sports Med. Rep. 20, 47–56 (2021).
Verma, A. et al. Association of kidney comorbidities and acute kidney failure with unfavorable outcomes after covid-19 in individuals with the sickle cell trait. JAMA Intern. Med. 182, 796–804 (2022).
MedlinePlus. HBA1 gene - hemoglobin subunit alpha 1. https://medlineplus.gov/genetics/gene/hba1/ (2022).
MedlinePlus. HBA2 gene - hemoglobin subunit alpha 2. https://medlineplus.gov/genetics/gene/hba2/ (2022).
Serjeant, G. R. et al. A plea for the newborn diagnosis of Hb S-hereditary persistence of fetal hemoglobin. Hemoglobin 41, 216–217 (2017).
Kumar, S. & Gerstein, M. Unified views on variant impact across many diseases. Trends Genet. 39, 442–450 (2023).
Castel, S. E. et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat. Genet. 50, 1327–1334 (2018).
Hartman, J. L. 4th, Garvik, B. & Hartwell, L. Principles for the buffering of genetic variation. Science 291, 1001–1004 (2001).
Domingo, J., Baeza-Centurion, P. & Lehner, B. The causes and consequences of genetic interactions (Epistasis). Annu. Rev. Genomics Hum. Genet. 20, 433–460 (2019).
Virolainen, S. J., VonHandorf, A., Viel, K. C. M. F., Weirauch, M. T. & Kottyan, L. C. Gene-environment interactions and their impact on human health. Genes Immun. 24, 1–11 (2023).
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Cooper, D. N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132, 1077–1130 (2013).
Kingdom, R. & Wright, C. F. Incomplete penetrance and variable expressivity: from clinical studies to population cohorts. Front. Genet. 13, 920390 (2022).
Forrest, I. S. et al. Population-based penetrance of deleterious clinical variants. JAMA 327, 350–359 (2022).
Kraft, P. Curses–winner’s and otherwise–in genetic epidemiology. Epidemiology 19, 649–651 (2008).
Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
Xiang, J. et al. Reinterpretation of common pathogenic variants in ClinVar revealed a high proportion of downgrades. Sci. Rep. 10, 331 (2020).
Jackson, L. et al. Influence of family history on penetrance of hereditary cancers in a population setting. eClinicalMedicine 64, 102159 (2023).
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Mensah, G. A. et al. Emerging concepts in precision medicine and cardiovascular diseases in racial and ethnic minority populations. Circ. Res. 125, 7–13 (2019).
Trevethan, R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front. Public Health 5, 307 (2017).
Ciesielski, T. H. et al. Diverse convergent evidence in the genetic analysis of complex disease: coordinating omic, informatic, and experimental evidence to better identify and validate risk factors. BioData Min. 7, 10 (2014).
Williams, S. M. & Haines, J. L. Correcting away the hidden heritability. Ann. Hum. Genet. 75, 348–350 (2011).
Adams, M. C., Evans, J. P., Henderson, G. E. & Berg, J. S. The promise and peril of genomic screening in the general population. Genet. Med. 18, 593–599 (2016).
Woltereck, R. Weitere experimentelle Untersuchungen uber Artveranderung, speziell uberdas Wesen quantitativer Artunterschyiede bei Daphniden. Verh. D. Tsch. Zool. Ges 1909, 110–172 (1909).
Sultan, S. E. Phenotypic plasticity as an intrinsic property of organisms. In: Phenotypic plasticity and evolution: causes, consequences, and controversies 3–24 (CRC Press).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Nykamp, K. et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genet. Med. 19, 1105–1117 (2017).
Sullivan, J. A., Schoch, K., Spillmann, R. C. & Shashi, V. Exome/genome sequencing in undiagnosed syndromes. Annu. Rev. Med. 74, 489–502 (2023).
Elhawary, N. A. et al. Genetic etiology and clinical challenges of phenylketonuria. Hum. Genomics 16, 22 (2022).
Rothman, K. J. & Greenland, S. Causation and causal inference in epidemiology. Am. J. Public Health 95, S144–S150 (2005).
Rothman, K. J. Causes. Am. J. Epidemiol. 104, 587–592 (1976).
Gunning, A. C. et al. Assessing performance of pathogenicity predictors using clinically relevant variant datasets. J. Med. Genet. 58, 547–555 (2021).
Wilcox, E. H. et al. Evaluating the impact of in silico predictors on clinical variant classification. Genet. Med. 24, 924–930 (2022).
Pejaver, V. et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria. Am. J. Hum. Genet. 109, 2163–2177 (2022).
Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2019).
Schiabor Barrett, K. M. et al. Clinical validation of genomic functional screen data: analysis of observed BRCA1 variants in an unselected population cohort. HGG Adv. 3, 100086 (2022).
MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
Geneletti, S., Gallo, V., Porta, M., Khoury, M. J. & Vineis, P. Assessing causal relationships in genomics: from Bradford-Hill criteria to complex gene-environment interactions and directed acyclic graphs. Emerg. Themes Epidemiol. 8, 5 (2011).
Strande, N. T. et al. Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the clinical genome resource. Am. J. Hum. Genet. 100, 895–906 (2017).
Siddiqui, S. S. et al. The Alzheimer’s disease-protective CD33 splice variant mediates adaptive loss of function via diversion to an intracellular pool. J. Biol. Chem. 292, 15312–15320 (2017).
Jensen, L. E., Hoess, K., Mitchell, L. E. & Whitehead, A. S. Loss of function polymorphisms in NAT1 protect against spina bifida. Hum. Genet. 120, 52–57 (2006).
Orrú, V. et al. A loss-of-function variant of PTPN22 is associated with reduced risk of systemic lupus erythematosus. Hum. Mol. Genet. 18, 569–579 (2009).
Mbikay, M. & Chrétien, M. The biological relevance of PCSK9: when less is better…. Biochem. Cell Biol. 100, 189–198 (2022).
Mercader, J. M. et al. A loss-of-function splice acceptor variant in IGF2 is protective for type 2 diabetes. Diabetes 66, 2903–2914 (2017).
Andersen, M. K. et al. Loss of sucrase-isomaltase function increases acetate levels and improves metabolic health in greenlandic cohorts. Gastroenterology 162, 1171–1182.e3 (2022).
Xue, Y. et al. Spread of an inactive form of caspase-12 in humans is due to recent positive selection. Am. J. Hum. Genet. 78, 659–670 (2006).
Rifkin, S. B. et al. Wood cookstove use is associated with gastric cancer in Central America and mediated by host genetics. Sci. Rep. 13, 16515 (2023).
Byars, S. G. & Voskarides, K. Antagonistic pleiotropy in human disease. J. Mol. Evol. 88, 12–25 (2020).
Carter, A. J. R. & Nguyen, A. Q. Antagonistic pleiotropy as a widespread mechanism for the maintenance of polymorphic disease alleles. BMC Med. Genet. 12, 160 (2011).
Corder, E. H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261, 921–923 (1993).
Raber, J., Huang, Y. & Ashford, J. W. ApoE genotype accounts for the vast majority of AD risk and AD pathology. Neurobiol. Aging 25, 641–650 (2004).
Oriá, R. B. et al. ApoE polymorphisms and diarrheal outcomes in Brazilian shanty town children. Braz. J. Med. Biol. Res. 43, 249–256 (2010).
Azevedo, O. G. R. et al. Apolipoprotein E plays a key role against cryptosporidial infection in transgenic undernourished mice. PLoS One 9, e89562 (2014).
Yassine, H. N. & Finch, C. E. APOE alleles and diet in brain aging and Alzheimer’s disease. Front. Aging Neurosci. 12, 150 (2020).
Fullerton, S. M. et al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. Hum. Genet. 67, 881–900 (2000).
van Exel, E. et al. Effect of APOE ε4 allele on survival and fertility in an adverse environment. PLoS One 12, e0179497 (2017).
Powers, M. S., Smith, P. H., McKee, S. A. & Ehringer, M. A. From sexless to sexy: why it is time for human genetics to consider and report analyses of sex. Biol. Sex Differ. 8, 15 (2017).
Khramtsova, E. A., Davis, L. K. & Stranger, B. E. The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 20, 173–190 (2019).
Clayton, J. A. Applying the new SABV (sex as a biological variable) policy to research and clinical care. Physiol. Behav. 187, 2–5 (2018).
Ciesielski, T. H. et al. Late-onset neonatal sepsis: genetic differences by sex and involvement of the NOTCH pathway. Pediatr. Res. https://doi.org/10.1038/s41390-022-02114-8 (2022).
Ciesielski, T. H., Bartlett, J., Iyengar, S. K. & Williams, S. M. Hemizygosity can reveal variant pathogenicity on the X-chromosome. Hum. Genet. 142, 11–19 (2023).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Kopanos, C. et al. VarSome: the human genomic variant search engine. Bioinformatics 35, 1978–1980 (2019).
McKusick-Nathans Institute of Genetic Medicine. OMIM -Online Mendelian Inheritance in Man - An Online Catalog of Human Genes and Genetic Disorders. https://www.omim.org/.
Ferla, M. P., Pagnamenta, A. T., Koukouflis, L., Taylor, J. C. & Marsden, B. D. Venus: elucidating the impact of amino acid variants on protein function beyond structure destabilisation. J. Mol. Biol. 434, 167567 (2022).
Michelanglo — VENUS Assessing the effect of amino acid variants have on structure [Internet]. [cited 2023 Aug 24]. Available from: https://michelanglo.sgc.ox.ac.uk/venus.
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Landry, L. G. & Rehm, H. L. Association of racial/ethnic categories with the ability of genetic tests to detect a cause of cardiomyopathy. JAMA Cardiol. 3, 341–345 (2018).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).
Dong, S. et al. Annotating and prioritizing human non-coding variants with RegulomeDB v.2. Nat. Genet. 55, 724–726 (2023).
Acknowledgements
This work was supported in part by the National Library of Medicine (NLM R01 LM010098) and the National Institutes of Health (NIH U01HG010219).
Author information
Authors and Affiliations
Contributions
T.C. drafted the manuscript, and all authors (T.C., S.I., G.S., and S.W.) made substantial intellectual contributions and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ciesielski, T.H., Sirugo, G., Iyengar, S.K. et al. Characterizing the pathogenicity of genetic variants: the consequences of context. npj Genom. Med. 9, 3 (2024). https://doi.org/10.1038/s41525-023-00386-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41525-023-00386-5