Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Erratum in

  • Science. 2012 Apr 20;336(6079):296

Abstract

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A. Derived allele frequency distribution in the CEU population for raw and high-confidence LoF variants, compared to missense and synonymous coding variants. Inset, distribution of the proportion of SNVs in each class at low allele counts (1-5). B. False positive rates (based on independent array genotyping) for LoF variants filtered for annotation artifacts and frequency-matched missense and synonymous SNVs. C. Distribution of frameshift indels along the coding region of affected genes, before and after filtering (a similar pattern is also seen for nonsense SNVs; data not shown).
Figure 2
Figure 2
A. Estimated probability of haploinsufficiency (presence of disease due to heterozygous loss of function), using a model trained using an independent set of LoF deletions as well as a set of known haploinsufficient genes. B. Association of coding variants with complex disease risk. Observed -log10(P) values for disease association in 17,000 individuals from 7 complex disease cohorts and a shared control group, following imputation of variants identified by the 1000 Genomes low-coverage pilot, are plotted against the expected null distribution for all LoF variants and frequency-matched missense and synonymous SNPs. C. Allele-specific expression analysis of nonsense variants, using RNA sequencing data from 119 lymphocyte cell lines. Circles show the proportion of LoF-carrying reads spanning each site across all heterozygous individuals. Variants predicted to cause nonsense-mediated decay (NMD, red) and those predicted to escape NMD (blue) are arbitrarily ordered by genome position within each class. Blue and red dashed horizontal lines indicate mean values in each class. Error bars, 95% CI.
Figure 3
Figure 3
A. Distribution of selected evolutionary and functional parameters for recessive disease genes (blue) and LoF-tolerant genes (red) compared to all protein-coding genes (grey). Values are transformed to z scores to allow parameters to be plotted together. Boxes show interquartile range with medians indicated with a vertical black line, and whiskers terminate at the most extreme point less than 1.5 times the interquartile range from the box. For each pair of P values, top value refers to the recessive vs LoF-tolerant comparison and bottom refers to the LoF-tolerant vs genome background comparison. As many of the parameters are left-skewed the medians typically fall below zero. B. P value distribution for linear discriminant model (LDM) trained using LoF-tolerant and recessive disease genes, based on human-macaque DN/DS ratio and PPI network proximity to known recessive disease genes. C. Receiver-operating characteristic (ROC) curve for LDM distinguishing between LoF-tolerant and recessive disease genes, both when olfactory receptor genes (ORs) are included (solid line, AUC = 0.831) and excluded (dashed line, AUC = 0.814). DN/DS, ratio of missense to synonymous substitutions; CNC GERP, GERP score for conserved non-coding elements within 50 kb of gene; PPI, protein-protein interaction.

Comment in

Similar articles

Cited by

References

    1. Ng PC, et al. PLoS Genet. 2008;4:e1000160. - PMC - PubMed
    1. 1000 Genomes Project Consortium Nature. 2010;467:1061. - PubMed
    1. Pelak K, et al. PLoS Genet. 2010;6:e1001111. - PMC - PubMed
    1. MacArthur DG, Tyler-Smith C. Hum. Mol. Genet. 2010;19:R125. - PMC - PubMed
    1. Balasubramanian S, et al. Genes Dev. 2011;25:1. - PMC - PubMed

Publication types