Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 6:2024.04.26.591349.
doi: 10.1101/2024.04.26.591349.

Loop Catalog: a comprehensive HiChIP database of human and mouse samples

Affiliations

Loop Catalog: a comprehensive HiChIP database of human and mouse samples

Joaquin Reyna et al. bioRxiv. .

Abstract

HiChIP enables cost-effective and high-resolution profiling of chromatin loops. To leverage the increasing number of HiChIP datasets, we developed Loop Catalog (https://loopcatalog.lji.org), a web-based database featuring loop calls from 1000+ distinct human and mouse HiChIP samples from 152 studies plus 44 high-resolution Hi-C samples. We demonstrate its utility for interpreting GWAS and eQTL variants through SNP-to-gene linking, identifying enriched sequence motifs and motif pairs, and generating regulatory networks and 2D representations of chromatin structure. Our catalog spans over 4.19M unique loops, and with embedded analysis modules, constitutes an important resource for the field.

Keywords: GWAS; HiChIP; SNP-to-gene linking; chromatin loops; database.

PubMed Disclaimer

Conflict of interest statement

Competing interests: F.A. is an Editorial Board Member of Genome Biology.

Figures

Figure 1.
Figure 1.
High-level summary of the Loop Catalog. A) Breakdown of HiChIP Samples from 2016 to 2024. The top panel shows the number of studies broken down by human (blue), mouse (teal), or both (orange). Bottom panel shows a cumulative breakdown. B) Schema for the development of the Loop Catalog starting from raw sequencing files to processing (top left), database storage (bottom) and web accessible analyses (top right). C) Breakdown of samples in the Loop Catalog by target protein or histone modification and organism.
Figure 2.
Figure 2.
Layout of the Loop Catalog portal. A) Screenshot of the main entry page to the Loop Catalog. B) Main data page which includes an embedding of the WashU Epigenome Browser followed by a table of HiChIP samples with various metadata fields. C) Screenshot of a HiChIP sample page with download links, summary of loop call statistics and an enhancer-promoter network visualization with an accompanying table listing detected community and subcommunity’s of this network (enhancers - circles, promoters - squares and other regions - triangles). D) Screenshot of the GWAS-SGL page with the locus of interest centered and a table of SGLs with navigation buttons. E) Screenshot of the 2D embedding models page which includes a visualization for each sample for the queried gene locus, a table of spatial autocorrelation analysis, and buttons to swap between 1D overlap of ChIP-seq or interaction (raw) signals.
Figure 3.
Figure 3.
SGL analysis overview and results. A) Schema of the SGL analysis using fine-mapped SNPs from CAUSALdb, Loop Catalog immune-related samples, and TSS coordinates. B) Summary of results across all 4 diseases including the total number of GWAS hits (blue), SNPs found in a SGL (orange), genes found in SGL (green) and total SGLs (red). C) Distribution of SNP counts with respect to gene (left) and the distribution of gene counts with respect to SNP (right) for T1D. D) Evaluating the number of SGL genes which belong to a consensus list of T1D genes (green) and unique (orange). E) Example of an SGL between rs61839660 (red) and the genes IL15RA (red arc) and RBM17 (blue arc). Six tracks with arcs represent H3K27ac HiChIP loops for naïve CD4+ T cell, naïve CD8+ T cell, naïve B cell, Natural Killer, monocytes, nonclassical monocytes derived from the Schmiedel et al 2018 samples that were merged across all donors.
Figure 4.
Figure 4.
Motif and paired-motif analysis of loop anchors. A) Schematic of the 1D and 2D (paired) motif analysis. For 1D motif enrichment analysis, HiChIP loops are aggregated across all samples in the sample set, conserved anchors are identified, and motif enrichment analysis is performed directly on the loop anchors. Paired motif analysis is performed on a per-sample basis. HiChIP loops from a single sample are overlapped with ChIP-seq peaks and paired motif analysis for loops is applied after motif scanning in ChIP-seq peaks. B) For the 1D motif enrichment analysis, Venn diagrams show the overlap between immune (n = 27) and non-immune (n = 27) H3K27ac HiChIP sample sets for unique loop anchors, conserved loop anchors, and significantly enriched motifs in conserved loop anchors reported by MEME Suite SEA (e-value < 0.01). C) Bubble plot for the union of the top 15 motifs from each H3K27ac sample set (HCRegLoops-All, HCRegLoops-Immune, and HCRegLoops-Non-Immune) and top 6 motifs from the CTCF sample set (HCStructLoops) (24 total motifs). The q-value is represented on a range from 0 (gray) to 300 (magenta) and the log2(enrichment ratio) is represented by a circle radius from 0.5 to 2. D) QQ plot testing p-values for Naive CD4+ T cell 1829-RH-1 for the paired motif analysis bootstrap method. E) Heatmap of significant motif pairs (center) where rows and columns represent motifs on opposite anchors and each cell represents the proportion of samples where the given motif pair is significant. The distributions of a given motif across the whole genome and within the top 25 motif pairs are represented on the top and right, respectively.

Similar articles

References

    1. Babbi G., Martelli P. L., Profiti G., Bovo S., Savojardo C., & Casadio R. (2017). eDGAR: A database of Disease-Gene Associations with annotated Relationships among genes. BMC Genomics, 18(Suppl 5), 554. 10.1038/s41467-018-04948-3 https://doi.org/10.1038/s41467-018-04948-5 - DOI - DOI - PMC - PubMed
    1. Bailey T. L., Johnson J., Grant C. E., & Noble W. S. (2015). The MEME Suite. Nucleic Acids Research, 43(W1), W39–49. 10.1093/nar/gkv416 - DOI - PMC - PubMed
    1. Barrett T., Wilhite S. E., Ledoux P., Evangelista C., Kim I. F., Tomashevsky M., Marshall K. A., Phillippy K. H., Sherman P. M., Holko M., Yefanov A., Lee H., Zhang N., Robertson C. L., Serova N., Davis S., & Soboleva A. (2013). NCBI GEO: Archive for functional genomics data sets--update. Nucleic Acids Research, 41(Database issue), D991–995. 10.1093/nar/gks1193 - DOI - PMC - PubMed
    1. Bell C. C., Balic J. J., Talarmain L., Gillespie A., Scolamiero L., Lam E. Y. N., Ang C.-S., Faulkner G. J., Gilan O., & Dawson M. A. (2024). Comparative cofactor screens show the influence of transactivation domains and core promoters on the mechanisms of transcription. Nature Genetics, 56(6), 1181–1192. 10.1038/s41588-024-01749-z - DOI - PubMed
    1. Benson G. (1999). Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Research, 27(2), 573–580. 10.1093/nar/27.2.573 - DOI - PMC - PubMed

Publication types