Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul;583(7818):729-736.
doi: 10.1038/s41586-020-2528-x. Epub 2020 Jul 29.

Global reference mapping of human transcription factor footprints

Affiliations

Global reference mapping of human transcription factor footprints

Jeff Vierstra et al. Nature. 2020 Jul.

Abstract

Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits1, but it remains challenging to distinguish variants that affect regulatory function2. Genomic DNase I footprinting enables the quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin3-6. However, only a small fraction of such sites have been precisely resolved on the human genome sequence6. Here, to enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate about 4.5 million compact genomic elements that encode transcription factor occupancy at nucleotide resolution. We map the fine-scale structure within about 1.6 million DNase I-hypersensitive sites and show that the overwhelming majority are populated by well-spaced sites of single transcription factor-DNA interaction. Cell-context-dependent cis-regulation is chiefly executed by wholesale modulation of accessibility at regulatory DNA rather than by differential transcription factor occupancy within accessible elements. We also show that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions1,7 is almost entirely attributable to variants within footprints, and that functional variants that affect transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find increased density of human genetic variation within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. A nucleotide-resolution atlas of TF occupancy on the human genome.
a, DNase I cleavage patterns (RELB locus in CD8+ T cells). Top, windowed DNase I cleavage density. Below, per-nucleotide cleavage and footprint posterior probabilities within two DHSs. b, Heat map of footprint posterior probabilities integrating 243 biosamples. Rows are individual biosamples grouped by tissue or organ systems; columns are individual nucleotides. Black fills to right of heat maps indicate overlapping DHSs in biosample. Below, DHS sequence scaled by footprint prevalence. Grey boxes, consensus footprints in one or more cell or tissue types (footprint posterior >0.99). c, Consensus map of TF occupancy derived from 243 biosamples covering 1.6 million DHSs providing expansive nucleotide-resolution annotation of regulatory DNA. d, Proportion of DHSs with footprints at given sequencing depth. Dashed red lines and dot show read depth (tags per 250 million uniquely mapped reads) at which footprint is detected in 90% of DHSs. e, Histogram of footprint location relative to DHS peak summit. Dashed red lines represent average size of a DHS peak (203 bp).
Fig. 2
Fig. 2. Footprints encapsulate topological structures of individual TF–DNA interactions.
a, Structure of CTCF zinc fingers 3–11 bound to cognate DNA recognition sequence (Protein Data Bank (PDB) codes: 5YEF and 5YEL). DNA coloration shows mean observed versus expected cleavage at footprinted CTCF motifs (in T regulatory cells). b, Heat map of relative cleavage at each of 25,852 footprinted CTCF motifs (posterior probability >0.99). Below, aggregate (summed) nuclease cleavage relative to footprinted motifs. Right, nuclease cleavage (observed and expected) at three footprints randomly selected across genome. c, Footprint width is tightly correlated with the width of the TF recognition sequence (Spearman’s ρ = 0.9, P = 0.001).
Fig. 3
Fig. 3. Modes of TF occupancy within regulatory DNA.
a, Overlap and spatial enrichment of TF recognition sequences within footprints binned by width. Left, density heat map of motif occurrences around footprints binned by width. Right, proportion of footprints uniquely overlapped by 0, 1 or 2 or more recognition sequences. b, Percentage of footprints representing occupancy of single TF (≤30 bp) or multiple TFs (>30 bp). c, d, Footprint density and spacing (edge-to-edge) plotted against mean normalized cleavage density (tags per 150 bp per million reads) within promoter-proximal and distal DHSs. Solid lines and shaded regions indicate median and middle 50th percentile, respectively. e, A typical DHS contains about five or six directly bound TFs spaced roughly 20 bp apart.
Fig. 4
Fig. 4. Functional genetic variation localizes in TF footprints.
a, Allelic imbalance assessed at 1.65 million variants discovered from 147 unique individuals represented in 243 biosamples. b, Percentage of variants imbalanced in footprinted and non-footprinted segments of DHS peaks. c, Variant rs10171498 (C→G; C allele ancestral) creates a de novo NFIX footprint. Top, allelically resolved per-nucleotide DNase cleavage aggregated from 56 heterozygotes. Middle, DNase cleavage in two samples homozygous for reference or alternative alleles. Bottom, mean differential per-nucleotide cleavage (log2) between homozygous reference (n = 74) and alternative allele samples (n = 12). Colour indicates statistical significance (–log10 P) of per-nucleotide differential test (Methods). Variant and differentially footprinted nucleotides precisely colocalize at the NFIX element. d, Histogram of allelic ratios for variants that overlap the footprinted NFIX recognition sequence. Grey, all variants tested (n = 7,110). Blue, significantly imbalanced variants (n = 1,889). Prop., proportion. e, Scatter plot of allelic imbalance in heterozygous individuals (x-axis) against relative difference in footprint depth between homozygous individuals at variants overlapping an NFIX footprint. Each point shows an individual SNV within the footprinted NFIX binding site that is both imbalanced (q < 0.2) in heterozygotes and differentially footprinted (nominal P < 0.05) in homozygotes. Grey line, fitted linear model. f, Allelic imbalance versus predicted energetic effects of variants within NFIX footprints. Shown is median log-odds score (reference versus alternate allele) of all tested variants within footprinted motifs binned by allelic ratio. Error bars show 5th and 95th percentiles of log-odds motif scores in each bin.
Fig. 5
Fig. 5. Footprints are highly polymorphic in human populations.
a, Histogram of mean phyloP scores within consensus footprints. Dashed red line, phyloP = 1. b, Nucleotide diversity (π) within footprints (mean ± 95% confidence interval of mean). Yellow bar, 95% confidence interval of mean π computed at fourfold degenerate coding sites. c, Mean nucleotide diversity versus distance from consensus footprints. Shaded bar represents average footprint width (14 bp). d, Density of observed and expected rare variation within footprints. Top, variants with minor allele frequencies (maf) <0.0001. Bottom, expected rare variation computed using 7-mer sequence context mutation rate model.
Fig. 6
Fig. 6. Trait-associated variation is concentrated in consensus footprints.
a, Enrichment of GWAS variants within or outside consensus footprints versus randomly sampled 1,000 Genomes Project (1KGP) variants, after expanding both with variants in perfect LD (r2 = 1.0, central European population). Centre lines, median; boxes, interquartile range (IQR); whiskers, 5th and 95th percentile (of enrichments from 1,000 sampling iterations). Statistical significance determined by a normal distribution fitted to sampled data (Supplementary Methods). b, c, Enrichment of SNP-based trait heritability using LD-score regression for UK BioBank (UKBB) GWAS on lymphocyte count (b) and red blood cell count (c). Pr(h2), proportion of trait narrow-sense heritability explained by SNPs overlapping DHSs or footprints. Pr(SNPs), proportion of total SNPs within each annotation. Asterisk, enrichment P < 0.01.
Extended Data Fig. 1
Extended Data Fig. 1. Statistical modelling of DNase I cleavage variation and footprint detection within a single dataset.
a, A negative binomial model was fit from the distribution of observed cleavage counts for each predicted cleavage rate. Shown are histograms of observed cleavage counts in CD19+ B cells at all genomic sites with 5, 25, or 60 expected cleavages. Red, the negative binomial distribution fit to the observed data using maximum likelihood estimation (Supplementary Methods). Blue, Poisson distribution with λ set to the corresponding expected cleavage rate. Lower right panel, means of fitted negative binomial distributions vs means of observed cleavage rates. Dashed grey line indicates y = x for reference. b, Estimated power of empirical cleavage dispersion model. Computed P values for different cleavage rate effect sizes with respect to expected cleavage rates in CD19+ B cells. Coloured lines represent the modelled effect size (depletion of cleavages) relative to the expected rate corresponding to a hexamer sequence model. c, Example of footprint detection within promoters for TMEM143 and SYNGR4 in CD19+ B cells. Expected cleavages were generated by reassigning observed cleavages according to a hexamer cleavage model (Supplementary Methods). The significance of difference between the observed and expected cleavages was evaluated per nucleotide using the negative binomial dispersion model. Individual P values are combined in 7-bp windows using Stouffer’s Z-score method. Per-nucleotide false discovery rates were computed by sampling from the expected null distributions. d, Autocorrelation of P values sampled from the expected negative binomial distribution. e, f, Histogram of windowed P values for observed (e) and sampled (f) data. g, Observed and sampled P values compared to empirically determine and calibrate false-positive rates (FPR).
Extended Data Fig. 2
Extended Data Fig. 2. Genomic footprints are reproducible, overlap evolutionarily constrained nucleotides, and are enriched for TF recognition sequences.
a, Motif density is associated with footprint strength. Plotted is the overlap of motif recognition sequence matches (P < 0.0001) with nucleotides ranked by footprint P value from CD19+ B cells. b, As in a, but for per-nucleotide evolutionary conservation (phyloP). c, Scatter plot of per-nucleotide footprint P values for replicate experiments from the same cell line (NAMALWA Burkitt’s lymphoma cells). All individual nucleotides within FDR 1% footprints in either replicate were considered for correlation analysis. d, As in c, but for replicates of the same primary cell (CD8+ T cells) between two distinct individuals. e, Pearson’s correlation between replicates pairs grouped by whether they were derived from the same cell and individual (n = 43) or were the same primary cell or tissue from different individuals (n = 111). Boxes indicate median and inner quartile-range (IQR). Whiskers, 5th and 95th percentile.
Extended Data Fig. 3
Extended Data Fig. 3. Characteristics of consensus derived genomic footprints.
a, Histogram of footprint discovery concordance (Jaccard similarity) between replicates (same cell type, same individual) using footprints called either independently (FDR 1%) (light orange) or with the empirical Bayes approach (posterior probability >0.99). b, Genomic distribution of consensus footprints stratified by DNase I signal of the encompassing DHS. c, Enrichment of footprint overlap in gene annotations vs. the distribution of DHSs. d, Effect of sequencing depth on footprint detection in CD8+ T cells. Sequencing tags were randomly subsampled (90% down to 10%) from the complete library. Footprints were called independently on each subsampled set of tags. Plotted is the number of FDR 1% footprints detected vs subsampled sequencing depth. Red dashed line, linear model fit. e, Same as d for CD4+ T cells. f, g, Contribution of individual samples to the consensus footprint index. Datasets were ordered randomly, and the collective number of footprints was computed after the footprints present within each additional dataset was considered. f, Mean total number of consensus footprints detected vs. number of datasets included after 100 iterations of random dataset orderings. Red dashed line shows a logarithmic curve fit (excluding first 50 samples). Grey dashed line indicates number of datasets that recapitulate 50% of consensus footprints. g, Mean number of novel footprints detected after the sequential addition of each sample.
Extended Data Fig. 4
Extended Data Fig. 4. Clustering motifs by similarity.
a, Outline of motif clustering approach. Motif models (n = 2,174) from ref. , JASPAR (2018), and HOCOMOCO were clustered by motif similarity. b, Hierarchically clustered heat map of the pairwise similarity scores between motifs. The cluster dendrogram was cut at height 0.7 to create non-redundant archetypal clusters of motifs. cf, Exemplar clusters of similar TF recognition sequences corresponding to KLF/SP (C2H2 family), EGR (C2H2 family), MEF2 (MADS) and E-box/CATATG (bHLH).
Extended Data Fig. 5
Extended Data Fig. 5. Classification of ChIP–seq data by genomic footprinting.
a, Precision-recall (PR) curve for predictions of CTCF motif occupancy (that is, overlap CTCF ChIP–seq peak) based on footprint posterior probabilities in CD20+ B cells. Black dot indicates precision and recall at posterior footprint probability threshold of >0.99. Blue, PR curve computed after shuffling ChIP–seq peak labels. b, Area under precision-recall curve (AUPR) computed for 21 ENCODE cell types and/or replicates (n = 33 total datasets). c, Distribution of MOODS scores stratified by motif overlap with a ChIP–seq peak and/or a genomic footprint in CD20+ B cells. d, ChIP–seq signal intensity at peaks overlapping a footptrinted CTCF motif vs. non-footprinted motifs in CD20+ B cells. e, Relative ChIP-seq signal at footprinted and non-footprinted CTCF peaks containing a motif across 21 cell types (n = 32 total datasets). fm, PR curves and relative ChIP–seq intensities for ATF3 (f, g), GATA1 (h, i), GABPA (j, k) and NFE2 (l, m) in K562 cells. For all TFs analysed, only motifs overlapping DHSs were considered. DHS, ChIP-seq and motif models are described in Supplementary Table 3. Boxes indicate median and IQR. Whiskers, 5th and 95th percentiles.
Extended Data Fig. 6
Extended Data Fig. 6. Aggregate DNase I cleavage profiles for diverse TFs.
a, Physical structure of the paired-box TF PAX6 bound to its cognate recognition element (PDB: 6PAX). be, Per-nucleotide DNase I cleavage patterns surrounding instances of motifs within genomic footprints for PAX6 (fetal eye) (b); EBF1 (differentiated bipolar neuron) (c); SOX3 (differentiated neuronal cell) (d); and MYF6 (fetal tongue) (e). For each, top left shows a randomly ordered heat map of the per-nucleotide relative DNase I cleavage protection (observed/expected) for each footprinted motif instance (posterior footprint probability >0.99). Below, aggregate DNase I protection averaged over all footprinted sites. Right, DNase I cleavage at individual motif instances (blue, observed cleavage; yellow, expected cleavage).
Extended Data Fig. 7
Extended Data Fig. 7. Cell-selective occupancy of TF recognition sequences.
a, Hierarchically clustered heat map of TF recognition sequence enrichment (–log10 q values; Supplementary Methods) overlapping consensus footprints. Rows correspond to motifs and columns correspond to individual samples. b, Clustered heat maps of posterior probabilities for footprints (left) overlapping an E-box/CAGCTG (MYF6_bHLH_1 motif model) and their corresponding DNase I density (right) in each sample. Rows and columns are ordered using K-means (k = 6) and hierarchical clustering, respectively. c, d, Same as b, for footprints overlapping an E-box/CATATG (c, Neurog1_MA0623.1 motif model) or MEIS (d, MEIS1_MEIS_1 motif model) recognition sequence.
Extended Data Fig. 8
Extended Data Fig. 8. Comparative footprinting identifies cell-selective TF occupancy at nucleotide resolution.
a, b, Comparative footprinting within the SCAMP5 (a) and UCP2 (b) promoters identifies footprints that are differentially occupied in nervous cell and tissue types. Top, DNase I cleavage in two exemplar nervous and non-nervous cell types. Bottom, mean differential per nucleotide cleavage (log2 observed/expected) between nervous system-derived (SCAMP5: n = 26; UCP2: n = 28 out of 31) and non-nervous samples (SCAMP5: n = 151; UCP2: n = 189 out of 212) in which region is DNase I hypersensitive (Supplementary Methods). The colour of each bar indicates the statistical significance (–log10 p) of the per-nucleotide differential test.
Extended Data Fig. 9
Extended Data Fig. 9. Differentially occupied nucleotides reflect aggregate DNase I cleavage profiles.
a, Differential footprint testing within thousands of accessible DHSs between nervous and non-nervous related biosamples. The vast majority of tested DHSs encode a single TF binding topology. Top, percentage of the DHSs tested that containing one or more differentially occupied element. Bottom left, distribution of differentially footprinted elements per DHS. Bottom right, selected TF recognition sequences significantly enriched in differentially occupied footprints (binomial test P < 0.01). Indicated in parenthesis is the fold-enrichment vs expected (based on prevalence of footprinted motif in tested regions). b, Density histograms of relative footprint occupancy between nervous-system derived and non-nervous-system derived samples for the TF recognition sequences of REST, NFIB, ZIC1 and EBF1. Grey indicates distribution of all motif instances tested. Black indicates differentially footprinted. c, Per-nucleotide aggregate plots of the mean relative DNase I protection (top) and differential test P value (–log10, bottom) around differential occupied motifs. d, Cell- and tissue-specific expression of genes nearby differentially occupied REST footprints. Enrichment was performed using Enrichr. Shown are cell and tissues with an adjusted Fisher exact test P value <0.01.
Extended Data Fig. 10
Extended Data Fig. 10. Detection of chromatin-altering variants.
a, Scatter plot of allelic ratios at 100 randomly selected high-confidence SNVs (Supplementary Methods) computed after aggregating reads from different samples (x-axis) against the distribution of allelic ratios at the same SNVs in each sample (y-axis; mean ± s.d.). The average s.d. indicated in the top left corner was used to tune the parameters of a beta-binomial distribution. b, Simulation of allelic ratios from the observed total read depth at high-confidence SNVs assuming a binomial distribution (P = 0.5) or a beta-binomial distribution. Grey indicates the observed allelic ratios at the same variants. c, Density histogram of allelic ratios for all tested SNVs (grey line) and significantly imbalanced SNVs (blue line). d, Proportion of SNVs imbalanced with respect to read depth for variants within (blue) or outside (orange) consensus footprints (posterior probability >0.99).
Extended Data Fig. 11
Extended Data Fig. 11. Enrichment of imbalanced variants within footprinted TF recognition sequences.
ad, Distribution of SNVs around the recognition sequences for CREB/ATF, NFI, TEAD and TFAP2 TFs. For each TF, shown are the total SNVs tested for imbalance (top), imbalanced variants (middle), and the proportion of variants imbalanced stratified overlap with a consensus footprint. e, log2 enrichment of imbalanced variants residing within TF recognition sequences relative to non-imbalanced SNVs for both variants within footprinted (blue) and non-footprinted motifs (orange). Motifs are grouped into clusters, where each point represents an individual motif model (Extended Data Fig. 4, Supplementary Table 2 and Supplementary Methods). Black bars indicate mean enrichment across all motifs in each cluster and footprint overlap. Only motifs with significant (q < 0.05) enrichment of imbalanced SNVs with a footprinted recognition sequence are shown.
Extended Data Fig. 12
Extended Data Fig. 12. Allelic imbalance parallels the predicted energetic effect of genetic variation.
a, Histogram of allelic ratios for variants overlapping footprinted CREB1 (CREB/ATF), TEAD1 (TEAD) and TFAP2C (TFAP2) recognition sequence. Grey line, all variants tested for imbalance. Blue line, all variants significantly imbalanced. b, Median log-odds score (reference versus alternate allele) of all tested variants within footprinted motifs binned by allelic ratio. Error bars show 5th and 95th percentiles of log-odds motif scores in each bin.
Extended Data Fig. 13
Extended Data Fig. 13. Nucleotide-resolved patterns of genetic variation within TF binding sites.
ac, Per-nucleotide profiles of phyloP scores (top) and human nucleotide diversity (π) (bottom) within footprinted motifs for ETS1 (a), JDP2 (b), and CTCF (c) motifs. Black box in the motif consensus logo annotates CpG dinucleotides. Asterisk indicates the position of the CpG dinucleotide in the profiles below. df, Ancient and recent constraints at the TF recognition sequences with respect to proximity to DHSs and consensus footprints. TF recognition sequences are grouped by those residing within ±5 kb of DHS peaks but not inside (outside DHS), inside DHSs but not a footprint (outside footprint) and those overlapping a consensus footprint. Top, percentage of TF recognition sequence under elevated evolutionary constraint (mean phyloP score >1) in each group. Bottom, mean nucleotide diversity within the footprinted motifs additionally stratified by evolutionary constraint. Boxes indicate median and IQR of enrichments from 1,000 bootstrap samples. Whiskers, 5th and 95th percentile.

Similar articles

  • An expansive human regulatory lexicon encoded in transcription factor footprints.
    Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M, Canfield T, Sabo P, Zhang M, Balasundaram G, Byron R, MacCoss MJ, Akey JM, Bender MA, Groudine M, Kaul R, Stamatoyannopoulos JA. Neph S, et al. Nature. 2012 Sep 6;489(7414):83-90. doi: 10.1038/nature11212. Nature. 2012. PMID: 22955618 Free PMC article.
  • The accessible chromatin landscape of the human genome.
    Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Qu H, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Yan Y, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. Thurman RE, et al. Nature. 2012 Sep 6;489(7414):75-82. doi: 10.1038/nature11232. Nature. 2012. PMID: 22955617 Free PMC article.
  • Most brain disease-associated and eQTL haplotypes are not located within transcription factor DNase-seq footprints in brain.
    Handel AE, Gallone G, Zameel Cader M, Ponting CP. Handel AE, et al. Hum Mol Genet. 2017 Jan 1;26(1):79-89. doi: 10.1093/hmg/ddw369. Hum Mol Genet. 2017. PMID: 27798116 Free PMC article.
  • Genomic footprinting.
    Vierstra J, Stamatoyannopoulos JA. Vierstra J, et al. Nat Methods. 2016 Mar;13(3):213-21. doi: 10.1038/nmeth.3768. Nat Methods. 2016. PMID: 26914205 Review.
  • Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression.
    Li MJ, Yan B, Sham PC, Wang J. Li MJ, et al. Brief Bioinform. 2015 May;16(3):393-412. doi: 10.1093/bib/bbu018. Epub 2014 Jun 10. Brief Bioinform. 2015. PMID: 24916300 Review.

Cited by

  • Denisovan introgression has shaped the immune system of present-day Papuans.
    Vespasiani DM, Jacobs GS, Cook LE, Brucato N, Leavesley M, Kinipi C, Ricaut FX, Cox MP, Gallego Romero I. Vespasiani DM, et al. PLoS Genet. 2022 Dec 8;18(12):e1010470. doi: 10.1371/journal.pgen.1010470. eCollection 2022 Dec. PLoS Genet. 2022. PMID: 36480515 Free PMC article.
  • Building integrative functional maps of gene regulation.
    Xu J, Pratt HE, Moore JE, Gerstein MB, Weng Z. Xu J, et al. Hum Mol Genet. 2022 Oct 20;31(R1):R114-R122. doi: 10.1093/hmg/ddac195. Hum Mol Genet. 2022. PMID: 36083269 Free PMC article.
  • Multiomic profiling of transcription factor binding and function in human brain.
    Loupe JM, Anderson AG, Rizzardi LF, Rodriguez-Nunez I, Moyers B, Trausch-Lowther K, Jain R, Bunney WE, Bunney BG, Cartagena P, Sequeira A, Watson SJ, Akil H, Cooper GM, Myers RM. Loupe JM, et al. Nat Neurosci. 2024 Jul;27(7):1387-1399. doi: 10.1038/s41593-024-01658-8. Epub 2024 Jun 3. Nat Neurosci. 2024. PMID: 38831039
  • Predicting the impact of sequence motifs on gene regulation using single-cell data.
    Hepkema J, Lee NK, Stewart BJ, Ruangroengkulrith S, Charoensawan V, Clatworthy MR, Hemberg M. Hepkema J, et al. Genome Biol. 2023 Aug 15;24(1):189. doi: 10.1186/s13059-023-03021-9. Genome Biol. 2023. PMID: 37582793 Free PMC article.
  • The ENCODE Uniform Analysis Pipelines.
    Hitz BC, Lee JW, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. Hitz BC, et al. Res Sq [Preprint]. 2023 Jul 19:rs.3.rs-3111932. doi: 10.21203/rs.3.rs-3111932/v1. Res Sq. 2023. PMID: 37503119 Free PMC article. Preprint.

References

    1. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science337, 1190–1195 (2012). - PMC - PubMed
    1. Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015). - PMC - PubMed
    1. Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat. Methods6, 283–289 (2009). - PMC - PubMed
    1. Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature489, 83–90 (2012). - PMC - PubMed
    1. Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011). - PMC - PubMed

Publication types