Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul;13(7):577-80.
doi: 10.1038/nmeth.3885. Epub 2016 May 30.

Data-driven hypothesis weighting increases detection power in genome-scale multiple testing

Affiliations

Data-driven hypothesis weighting increases detection power in genome-scale multiple testing

Nikolaos Ignatiadis et al. Nat Methods. 2016 Jul.

Abstract

Hypothesis weighting improves the power of large-scale multiple testing. We describe independent hypothesis weighting (IHW), a method that assigns weights using covariates independent of the P-values under the null hypothesis but informative of each test's power or prior probability of the null hypothesis (http://www.bioconductor.org/packages/IHW). IHW increases power while controlling the false discovery rate and is a practical approach to discovering associations in genomics, high-throughput biology and other large data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Histograms stratified by the covariate as a diagnostic plot.
a) The histogram of all p-values shows a mixture of a uniform distribution (corresponding to the true null hypotheses) and an enrichment of small p-values to the left (corresponding to the alternatives). Such a well-calibrated histogram is the starting point for most multiple testing methods. b-d) Histograms after splitting the hypotheses into three groups based on the values of the covariate. Shown is an example of a good covariate: each histogram still shows a uniform component, but the mixture proportion and/or the shape of the alternative distribution differ between the groups. If all histograms look the same, the covariate is uninformative, and its use would not lead to an increase in power. If the tails are no longer uniform, independence under the null is violated, and application of IHW is not valid.
Figure 2
Figure 2. Performance evaluation.
Panels a-c show the number of discoveries with IHW and BH on real data as a function of the target FDR. a) RNA-Seq dataset [13] with mean of normalized counts for each gene as the covariate. b) SILAC dataset [15], with number of peptides quantified per protein as the covariate. c) hQTL dataset [16] for Chromosome 21, with genomic distance between SNPs and ChIP-seq signals as the covariate. Independent Filtering with different distance cutoffs was also applied. d) Weight function learned by IHW at α = 0.1 for the hQTL dataset. Shown are the curves for the five folds in the data splitting scheme. Panels e-h benchmark different methods based on simulations. Brief descriptions of each method are in Table 2. e–f) Type I error control if all null hypotheses are true. Shown is the true FDR against the nominal significance level α. e) All methods shown make too many false discoveries. f) BH, FDRreg, and IHW control the FDR. LSL-GBH and Clfdr are slightly anticonservative. g-h) Implications of different effect sizes. The two-sample t-test was applied to Normal samples (n = 2 × 5, σ = 1) with either the same mean (nulls) or means differing by the effect size indicated on the x-axis (alternatives). The fraction of alternatives was 0.05. The pooled sample variance was used as the covariate. The nominal level was α = 0.1 (dotted line). g) The y-axis shows the actual FDR. h) Power analysis. All methods show improvement over BH.
Figure 3
Figure 3. True discovery rate and informative covariates.
a) Schematic representation of the density fi, which is composed of the alternative density f1,i weighted by its prior probability π1,i and the uniform null density weighted by π0,i. b-d) The true discovery rate (tdr) of individual tests can vary. In b), the test has high power, and π0,i is well below 1. In c), the test has equal power, but π0,i is higher, leading to a reduced tdr. In d), π0,i is like in b), but the test has little power, again leading to a reduced tdr. e) If an informative covariate is associated with each test, the distribution of the p-values from multiple tests is different for different values of the covariate. The contours represent the joint density of p-values and covariate. The BH procedure accounts only for the p-values and not the covariates (dashed red line). In contrast, the decision boundary of IHW is a step function; each step corresponds to one group, i. e., to one weight. f) By Equation (1), the density of the tdr also depends on the covariate. The decision boundary of the BH procedure (dashed red line) leads to a suboptimal set of discoveries, in this example with higher than optimal tdr for intermediate covariate values and too low otherwise. In contrast, IHW approximates a line of constant tdr, implying efficient use of the FDR budget. An important feature of IHW is that it works directly on p-values and covariates rather than explicitly estimating the tdr.

Similar articles

Cited by

References

    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 1995:289–300.
    1. Benjamini Y, Krieger AM, Yekutieli D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika. 2006;93:491–507.
    1. Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2004;66:187–205.
    1. Efron B. Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press; 2010.
    1. Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinformatics. 2008;9:303. - PMC - PubMed

Publication types