Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Dec 17:2023.12.15.571696.
doi: 10.1101/2023.12.15.571696.

Population-scale skeletal muscle single-nucleus multi-omic profiling reveals extensive context specific genetic regulation

Affiliations

Population-scale skeletal muscle single-nucleus multi-omic profiling reveals extensive context specific genetic regulation

Arushi Varshney et al. bioRxiv. .

Abstract

Skeletal muscle, the largest human organ by weight, is relevant in several polygenic metabolic traits and diseases including type 2 diabetes (T2D). Identifying genetic mechanisms underlying these traits requires pinpointing cell types, regulatory elements, target genes, and causal variants. Here, we use genetic multiplexing to generate population-scale single nucleus (sn) chromatin accessibility (snATAC-seq) and transcriptome (snRNA-seq) maps across 287 frozen human skeletal muscle biopsies representing nearly half a million nuclei. We identify 13 cell types and integrate genetic variation to discover >7,000 expression quantitative trait loci (eQTL) and >100,000 chromatin accessibility QTLs (caQTL) across cell types. Learning patterns of e/caQTL sharing across cell types increased precision of effect estimates. We identify high-resolution cell-states and context-specific e/caQTL with significant genotype by context interaction. We identify nearly 2,000 eGenes colocalized with caQTL and construct causal directional maps for chromatin accessibility and gene expression. Almost 3,500 genome-wide association study (GWAS) signals across 38 relevant traits colocalize with sn-e/caQTL, most in a cell-specific manner. These signals typically colocalize with caQTL and not eQTL, highlighting the importance of population-scale chromatin profiling for GWAS functional studies. Finally, our GWAS-caQTL colocalization data reveal distinct cell-specific regulatory paradigms. Our results illuminate the genetic regulatory architecture of human skeletal muscle at high resolution epigenomic, transcriptomic, and cell-state scales and serve as a template for population-scale multi-omic mapping in complex tissues and traits.

PubMed Disclaimer

Conflict of interest statement

4.32Competing interests SCJP has a research grant from Pfizer.

Figures

Figure 1:
Figure 1:. snRNA and snATAC -seq data generation and integration identifies 13 high quality cell-type clusters
(A) Study design including sample processing, snRNA and snATAC -seq profiling, and analyses. (B) UMAP plot showing the 13 identified clusters after jointly clustering the snRNA and snATAC modalities. (C) Cluster abundance shown as percentage of total nuclei. (D) Cluster proportions across samples and modalities. Bottom row denotes the processing batch number (1–10) for samples, indicating that the proportions are not driven by batch effects. (E) Gene expression (post ambient-RNA adjustment) in clusters for known marker genes for various cell-types. (F) Identification of cell-type-specific genes across clusters. Five related muscle fiber clusters (type 1, 2a, 2x, neuromuscular junction and muscle fiber mixed were taken together as a“muscle fiber” cell type). (G) GO term enrichment for cell-type-specific genes identified in (F), showing two GO terms for each cluster. (H) snATAC-seq profiles over known marker genes in clusters. (I) Comparison of snATAC-seq peaks identified for clusters in this study with reference data across various cell-types from the Zhang et al. [42] scATAC-seq atlas. Gray cells denote no overlaps between cell-type specific peaks in our dataset and those in the Zhang et al dataset.
Figure 2:
Figure 2:. Thousands of e/caQTLs identified in clusters
(A) UpSet plot showing eGenes, and (B) caPeaks in clusters (FDR<5%) (C) An example caQTL. Heatmap shows normalized snATAC-seq reads across samples in the type 1 cluster, separated by caSNP rs12336284 genotype classes. Aggregate profiles by genotype are shown on top. Examples of shared and cluster-specific caQTL are shown in (D) , (E) , and (F) . Top two rows show snATAC-seq profiles by the caSNP genotype in type 1 and FAP cell types, followed by aggregate snATAC profiles across clusters. (G) Reconstruction of the BACH_1 TF motif using caQTL data. From top, row 1: original motif PWM. Row 2: genetically reconstructed motif PWM. For all BACH_1 motifs occurring in type 1 snATAC-seq peaks (peak-motifs) that also overlapped type 1 caSNPs, alleles associated with higher chromatin accessibility (“favored alleles”) were quantified using the caQTL aFC, followed by PWM generation. Row 3: favored allele counts for caSNPs in BACH_1 peak-motifs. Row 4: PWM reconstructed using the nucleotide counts for all heterozygous SNPs overlapping the BACH_1 peak-motifs. Row 5: nucleotide counts for all heterozygous SNPs in the BACH_1 peak-motifs. (H) Comparison of caSNP effect size and MAF with the replication of snATAC-seq peaks in a reference scATAC dataset. (I) Allelic fold change for type 1 e/caSNPs that overlap skeletal muscle active TSS or stretch enhancer chromatin states. P values from a two-sided Wilcoxon rank sum test.
Figure 3:
Figure 3:. Learning patterns of e/caQTLs signal sharing across clusters inform effect estimates
(A) Fitting a mash model and estimating effects across clusters, UpSet plots show the number of shared and specific eGenes, and (B) caPeaks at a local false sign rate (lfsr)< 5%. (C) Fraction of eQTL or (D) caQTL effect estimates with the same sign for each pair of clusters. (E) Example eQTL and (F) caQTL showing original effects (slope) from the QTL scan and the effects estimated from mash. Bars show 95% confidence intervals. For the original eQTL results, standard errors are calculated from qvalues correcting for the total numbers of features tested after a Benjamini-Hochberg correction (hence equivalent of Mashr lfsr). For the Mashr results, estimate is the posterior mean, and error bars depict ± 1.96 * posterior standard deviations. Orange color highlights estimates where CIs don’t overlap zero.
Figure 4:
Figure 4:. Identifying state-specific e/caQTL in endothelial cluster by testing genotype by context interaction
(A) Subclustering of the endothelial nuclei. Left: snRNA UMAP plot showing discrete subcluster contexts; right: snRNA UMAP plots show five latent factors as continuous contexts. (B) eGene examples with significant G×C interaction with subclusters (left) or factors (right) as context. (C) snATAC UMAP plot showing endothelial subclusters. Due to sparsity of snATAC data, counts were pseudobulked by sample within each subcluster prior to testing for G×C interaction. (D) caPeak examples with significant G × subcluster interaction.
Figure 5:
Figure 5:. e/caQTL finemapping, colocalization and causal inference informs regulatory grammar in clusters
(A) Fraction of finemapped eQTL and (B) caQTL signals by the 95% credible set size. Probability of (C) eSNPs overlapping snATAC peaks relative to the eSNP PIPs; and (D) caSNPs overlapping the caPeak relative to the caSNP PIPs. Gray lines and confidence intervals are obtained after from shuffling e/caSNP PIPs. (E) eQTL-caQTL pairs with lead SNPs within 100kb in each cluster were tested for colocalization. Heatmap shows the posterior probability of shared causal variant (PP H4) from coloc v5. (F) TF motif enrichment in caPeaks that colocalize with eGenes relative to all caPeaks in a cluster. Clusters with at least 100 colocalized caPeaks are shown. * denotes significant logistic regression coefficient (5% FDR). (G) For each colocalized eGene-caPeak pair, causal inference tests (CIT) can inform the causal direction - Chromatin accessibility over gene expression (ca-to-e) or vice versa (e-to-ca) using e/ca SNPs as instrument variables. Barplot shows the percentage of colocalized eGene-caPeak pairs where the putative causal direction could be determined consistently from CIT and MR Steiger directionality test (5% FDR). (H) Logistic regression modeling the causal direction between caPeak-eGene pairs with whether the caPeak lies within the eGene body, along with eGene expression (TPM,) caPeak height (RPM), and GC content. (I) Probability that a caSNP lies in the caPeak relative to caSNP PIP bins. Colors depict if the caPeak was inferred as ca-to-e or e-to-ca from CIT. (J) Where multiple caPeaks colocalize with an eGene, CIT can help delineate causal direction. (K) At the GSDME locus, caQTLs for a distal-peak and a TSS-peak both colocalized with the eQTL. Type 1 snATAC-seq signal track by rs10276677 genotype at this locus shows the distal-caPeak, TSS-caPeak and the GDSME gene TSS. Aggregate snATAC-seq in clusters are shown below. (L) Locus-zoom plots show the distal-caQTL, TSS-caQTL and the GDSME eQTL. (M) Causal inference between the distal-caPeak, TSS-caPeak and the GDSME gene using rs10276677 as the instrument variable. Boxplots show inverse normalized chromatin accessibility or gene expression relative to the alternate allele dosages at rs10276677 before and after regressing out the corresponding modality.
Figure 6:
Figure 6:. Enrichment of GWAS traits in cluster snATAC peaks
(A) GWAS enrichment in cluster snATAC peak features. Heatmap shows the LDSC regression coefficient Z scores. (B) T2D GWAS Enrichment fin type 1 fiber snATAC peaks that contain a caSNP or eSNP or peaks that do not overlap e/caSNPs. Error bars represent the 95% confidence intervals. * = FDR < 5% on the regression coefficient, and . = FDR < 5% on the heritability enrichment.
Figure 7:
Figure 7:. Integrating e/caQTL signals with GWAS informs disease/trait relevant regulatory mechanisms
(A) Percentage and (B) Number of GWAS signals across traits that colocalize with e/caQTL signals across the five clusters. (C) Proportion of colocalized GWAS signals (from B) that colocalize with only caQTL or only eQTL or both e-and-caQTL. (D) Heatmaps showing T2D GWAS signal colocalization with caQTL (top) and eQTL (middle). Target gene predictions using snATAC co-accessibility (Cicero) between colocalized caPeak and gene TSS peak are shown in the bottom heatmap. * indicates that the GWAS hit also had a nominally significant eQTL P value for the Cicero-nominated gene in that cluster. (E) T2D GWAS signal at the ARL15 locus is colocalized with an FAP caQTL. The genomic locus is shown at the top, followed by zooming into a ±1kb neighborhood of the caSNP rs702634. snATAC-seq profiles in five clusters by the caSNP genotype are shown, followed by aggregate profiles across clusters. (F) Locuszoom plots showing the ARL15 GWAS signal (top) followed by the caQTL signal in five clusters. (G) Hi-C chromatin contacts at 5kb resolution imputed by EPCOT using the FAP snATAC-seq data (shown below the heatmap) in a 1Mb region over rs702634. (H) Difference in the predicted normalized chromatin contacts using FAP ATAC-seq from samples with the high accessibility genotype (GG) and low accessibility genotype (AA) at rs702634. Interactions with rs702634 highlighted in black are shown as a signal track below the heatmap. (I) Genes in the 1Mb neighborhood of the ARL15 gene. Chromatin co-accessibility scores between the caPeak and TSS peaks for the neighboring genes, classified by genotype classes at rs702634. Distance of the TSS peak to the caPeak in kb is shown in parentheses. (J) GWAS signals for T2D and insulin fold change (IFC) at the C2CD4A/B colocalize with a caQTL in type 1 and type 2a fibers. The genomic locus, snATAC-seq profiles by the caSNP genotype and aggregated profiles are shown. (K) Locuszoom plots showing the C2CD4A/B GWAS and caQTL signals. (L) Micro-C chromatin contacts imputed at 1kb resolution by EPCOT using the type 1 snATAC-seq showing rs7163757 and the neighboring 500kb region. (M) Difference in the predicted normalized chromatin contacts by rs7163757 genotype. Interactions with rs7163757 highlighted in black are shown as a signal track below. (N) A massively parallel reporter assay in the muscle cell line LHCN-M2 tested a 198bp element centered on the caSNP rs7163757. Enhancer activity is measured as log2(RNA/DNA) normalized to controls.

Similar articles

Cited by

References

    1. Frontera W. R. & Ochala J. Skeletal Muscle: A Brief Review of Structure and Function. en. Calcified Tissue International 96, 183–195 (Mar. 2015). - PubMed
    1. Kim G. & Kim J. H. Impact of Skeletal Muscle Mass on Metabolic Health. Endocrinology and Metabolism 35, 1–6 (Mar. 2020). - PMC - PubMed
    1. Morgan J. & Partridge T. Skeletal muscle in health and disease. Disease Models & Mechanisms 13, dmm042192 (Feb. 2020). - PMC - PubMed
    1. Sriwijitkamol A. et al. Effect of Acute Exercise on AMPK Signaling in Skeletal Muscle of Subjects With Type 2 Diabetes: A Time-Course and Dose-Response Study. Diabetes 56, 836–848 (Mar. 2007). - PMC - PubMed
    1. Musi N. et al. Metformin Increases AMP-Activated Protein Kinase Activity in Skeletal Muscle of Subjects With Type 2 Diabetes. Diabetes 51, 2074–2081 (July 2002). - PubMed

Publication types