Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 25;14(1):3030.
doi: 10.1038/s41467-023-38795-w.

A computational method for cell type-specific expression quantitative trait loci mapping using bulk RNA-seq data

Affiliations

A computational method for cell type-specific expression quantitative trait loci mapping using bulk RNA-seq data

Paul Little et al. Nat Commun. .

Abstract

Mapping cell type-specific gene expression quantitative trait loci (ct-eQTLs) is a powerful way to investigate the genetic basis of complex traits. A popular method for ct-eQTL mapping is to assess the interaction between the genotype of a genetic locus and the abundance of a specific cell type using a linear model. However, this approach requires transforming RNA-seq count data, which distorts the relation between gene expression and cell type proportions and results in reduced power and/or inflated type I error. To address this issue, we have developed a statistical method called CSeQTL that allows for ct-eQTL mapping using bulk RNA-seq count data while taking advantage of allele-specific expression. We validated the results of CSeQTL through simulations and real data analysis, comparing CSeQTL results to those obtained from purified bulk RNA-seq data or single cell RNA-seq data. Using our ct-eQTL findings, we were able to identify cell types relevant to 21 categories of human traits.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Summary of the results from simulation studies.
a Simulated cell type (CT) proportions for three scenarios. Scenario 1: equally abundant and highly variable cell type proportions. Scenario 2: variable abundance and smaller variance. Scenario 3: modification of scenario 2 by adding outliers of cell type proportions. Box plots and violin plots were derived from n = 300 simulated cell type proportions. For each boxplot, the box ranges from Q1 (the first quartile) to Q3 (the third quartile). The median is indicated by a line across the box. The whiskers extend from Q1 and Q3 to the most extreme data points within the 1.5 IQR of the box and IQR = Q3 − Q1. b Simulation results under the global null (i.e., no eQTL for any cell type) for different scenarios and methods (columns of the plots) and reference allele expression configurations (rows of the plots). Each reference allele configuration is denoted by fold change of reference allele expression. For example, “10_0.1” indicates fold changes of 10 in CT2 over CT1 and 0.1 in CT3 over CT1. c Simulation under the mixture of null and alternative hypothesis by models, scenarios, and reference allele expression configurations. Results of (b) and (c) are obtained after trimming outliers.
Fig. 2
Fig. 2. Summary of cell type compositions and the number of eGenes.
a Cell type proportion estimates of six cell types astrocytes (Astro), excitatory neuron (Exc), inhibitory neuron (Inh), microglia (Micro), oligodendrocytes (Oligo), and oligodendrocyte precursor cells (OPC) from the brain samples of schizophrenia patients and controls from the CommonMind Consortium (CMC) as well as samples from GTEx Brain. Box plots are derived from n = 283 CMC-Control, n = 250 CMC-SCZ, and n = 175 GTEx Brain samples. b Cell type proportion estimates of seven cell types from whole blood samples of GTEx. Cell type proportions were first estimated for 22 cell types (Supplementary Fig. 8) and then collapsed to seven cell types to avoid individual cell types with very low proportions and variances. Box plots are derived from n = 670 GTEx whole blood samples. In both (a) and (b), for each boxplot, the box ranges from Q1 (the first quartile) to Q3 (the third quartile). The median is indicated by a line across the box. The whiskers extend from Q1 and Q3 to the most extreme data points within the 1.5 IQR of the box and IQR = Q3 − Q1. c A summary of the number of detected eGenes per stratum by method and case/control status for CMC. d A summary of detected eGenes by method for GTEx whole blood. For (c) and (d), the X-axis is the (number of cell type-specific eGenes + 1) in log (base 10) scale. The Y-axis denotes the percentage of cell type-specific eGenes that overlap with eGenes from bulk eQTL mapping.
Fig. 3
Fig. 3. Validation of CSeQTL results.
We compare the results by CSeQTL vs. the results from cell type purified bulk RNA-seq data (BLUEPRINT) or scRNA-seq data from blood or brain. a, d The proportion of CSeQTL eGenes recovered from the top 500 (<5%) eGenes of other studies, using three configurations: (1) q-value < 0.005; (2) q-value < 0.005 and fold change ≥1.5; and (3) q-value < 0.001 and fold change ≥1.5. b, e Illustration of the recovered eGene proportions at configuration 3 for all cell type pairs. c, f For each pair of cell types studied by CSeQTL vs. BLUEPRINT or Yazar et al., we evaluated the ratio of the observed number of overlapping eGenes vs. its expected value. The ratios and corresponding p values by two-sided Fisher’s exact test are labeled for each pair of matched cell types. g, h The proportion of eQTLs that have consistent directions by comparing CSeQTL results (top eQTL per gene with q-value cutoff of 0.1 or 0.005) vs. two scRNA eQTL studies with p value cutoff of 0.01.
Fig. 4
Fig. 4. GWAS enrichment for CMC and GTEx brain.
Black diamonds correspond to point estimates of log enrichment of eQTLs among GWAS hits, while open and filled circles (the centers of error bars) correspond to jackknife estimates of log enrichment. The block jackknife-based 95% confidence intervals are derived from sorting and grouping genes and loci into n = 200 blocks. Intervals are converted to nominal p values that are then Bonferroni corrected. Filled circles correspond to the ones with lower bound of confidence intervals larger than zero and adjusted p values < 0.05.
Fig. 5
Fig. 5. GWAS enrichment for BLUEPRINT and GTEx whole blood.
Black diamonds correspond to point estimates of log enrichment of eQTLs among GWAS hits, while open and filled circles (the centers of error bars) correspond to jackknife estimates of log enrichment. The block jackknife-based 95% confidence intervals are derived from sorting and grouping genes and loci into n = 200 blocks. Intervals are converted to nominal p values that are then Bonferroni corrected. Filled circles correspond to the ones with lower bound of confidence intervals larger than zero and adjusted p values < 0.05.

Similar articles

Cited by

References

    1. Regev A, et al. Science forum: the human cell atlas. elife. 2017;6:e27041. doi: 10.7554/eLife.27041. - DOI - PMC - PubMed
    1. Wang D, et al. Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464. doi: 10.1126/science.aat8464. - DOI - PMC - PubMed
    1. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. - DOI - PMC - PubMed
    1. Kim-Hellmuth, S. et al. Cell type-specific genetic regulation of gene expression across human tissues. Science369, eaaz8528 (2020). - PMC - PubMed
    1. Finucane HK, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. - DOI - PMC - PubMed

Publication types