Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;52(D1):D322-D333.
doi: 10.1093/nar/gkad1048.

TE-TSS: an integrated data resource of human and mouse transposable element (TE)-derived transcription start site (TSS)

Affiliations

TE-TSS: an integrated data resource of human and mouse transposable element (TE)-derived transcription start site (TSS)

Xiaobing Gu et al. Nucleic Acids Res. .

Abstract

Transposable elements (TEs) are abundant in the genome and serve as crucial regulatory elements. Some TEs function as epigenetically regulated promoters, and these TE-derived transcription start sites (TSSs) play a crucial role in regulating genes associated with specific functions, such as cancer and embryogenesis. However, the lack of an accessible database that systematically gathers TE-derived TSS data is a current research gap. To address this, we established TE-TSS, an integrated data resource of human and mouse TE-derived TSSs (http://xozhanglab.com/TETSS). TE-TSS has compiled 2681 RNA sequencing datasets, spanning various tissues, cell lines and developmental stages. From these, we identified 5768 human TE-derived TSSs and 2797 mouse TE-derived TSSs, with 47% and 38% being experimentally validated, respectively. TE-TSS enables comprehensive exploration of TSS usage in diverse samples, providing insights into tissue-specific gene expression patterns and transcriptional regulatory elements. Furthermore, TE-TSS compares TE-derived TSS regions across 15 mammalian species, enhancing our understanding of their evolutionary and functional aspects. The establishment of TE-TSS facilitates further investigations into the roles of TEs in shaping the transcriptomic landscape and offers valuable resources for comprehending their involvement in diverse biological processes.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
TE-TSS workflow. (A) The TE-TSS pipeline involves the integration of annotated and predicted TSSs. TE-derived TSSs were identified by intersecting TEs with TSSs, followed by the merging of adjacent TE-derived TSSs. The resulting regions were then expanded by ±50 bp to establish TE-derived TSS regions, which formed the basis for subsequent evolutionary and functional analyses. (B) Schematic diagram showing the identification of annotated/predicted TSSs and TE-derived TSSs.
Figure 2.
Figure 2.
Integration and analysis of human and mouse TSS references for usage estimation. (A) Upset plot displaying the distribution of human TSSs across RefSeq, GENCODE and Ensembl, as well as TSSs among them that have experimental validation by existing TSS assays. These TSSs with experimental validation are defined as annotated TSSs. (B) Distribution of TSSs identified across varying numbers of samples. Predicted TSSs were retained if they appeared in a minimum of three samples. (C) Upset plots showing the distribution of predicted TSSs overlapping with existing TSS assays in K562 (left panel) and GM12878 (right panel) cells.
Figure 3.
Figure 3.
Identification of TE-derived TSSs. (A) Composition of predicted and annotated TE-derived TSSs, along with the distribution of TE classes in the human genome, depicted in a pie chart. (B) Distribution of gene types associated with human annotated TE-derived TSSs, categorizing 1502 lncRNAs, 920 protein-coding genes and 46 pseudogenes. (C) Radar plot illustrating the enrichment score of TE-derived TSSs originating from distinct TE classes. The enrichment score, representing the log2 ratio of observed counts to expected counts, is indicative of the propensity for TEs within each class to evolve into TSSs. Observed counts denote the actual number of TEs within the TE class that have evolved into TSSs, while expected counts are calculated under the assumption of equal probability for TEs from each class to evolve into TSSs. (D) Beanplot showing the distribution of sequence divergence for different TE classes, including those TEs that have evolved into TSSs. The presentation includes boxplots depicting the median and interquartile range (IQR), with whiskers extending to 1.5 times the IQR. Wilcoxon rank-sum test P-values are shown. (E) The protein-coding genes with higher expression levels are more likely to have TE-derived TSSs. Wilcoxon rank-sum test P-values are shown. (F) Genes with TE-derived TSSs exhibit significantly higher (P-values <2.2 × 10−16, Wilcoxon rank-sum test) tissue specificity than housekeeping genes. (G) Identification of TE-derived TSSs in different human and mouse embryonic cells, depicted through boxplots illustrating the median and IQR, with whiskers extending to 1.5 times the IQR. (H) Usage of TE-derived TSSs in different human and mouse embryonic cells. Boxplots display median and IQR, with whiskers extending to 1.5 times the IQR. (I) Gene ontology analysis of genes with tissue-specific TE-derived TSSs detected in human embryonic cells.
Figure 4.
Figure 4.
Evolutionary analysis of TE-derived TSSs. (A) (Left) Schematic diagram of the gene structure of ARHGAP15 with a MER130-derived alternative TSS across various species. The dotted box highlights the specific region corresponding to the TE-derived TSS. (Right) Multiple sequence alignment of this MER130-derived TSS region across various species. (B) Schematic diagram showing a primate-conserved L1MEc-derived alternative TSS drives the production of a truncated MAEL isoform in testis samples.
Figure 5.
Figure 5.
TF binding motifs in TE-derived TSS regions. (A) TF motif enrichment at TE-derived TSSs for each TE family. (B) Top 10 motifs with the highest occurrence frequency identified within TE-derived TSS regions in both humans and mice. Motifs with P-value <10−4 are shown. (C) An example of an LTR-derived TSS region predicted to contain binding motifs for ZNF148, ZNF740 and CTCF. Notably, ChIP-seq signals for ZNF148, ZNF740 and CTCF exhibit significant peaks in this LTR-derived TSS region.
Figure 6.
Figure 6.
Overview of TE-TSS interface and usage. (A) TE-TSS offers five distinctive functional modules. (B) Users can search for specific TE-derived TSSs based on species, gene and sample parameters in the Home module. (C) The Browser module allows exploration of TSS usage across various samples using the genome browser. The button within the dotted box facilitates switching between sample files. (D) The TE-derived TSS search module enables a comprehensive exploration of available features. (E) Homology analysis involves multiple sequence alignment, modified BLAT score and consensus sequence browser for TE-derived TSS regions. (F) The results of motif analysis are visually presented for the TE-derived TSS region. (G) A list of TSSs and TE-derived TSSs is available for download in the Home module. (H) Convenient access to RNA-seq file information and associated downloads.

Similar articles

References

    1. Barbara M. Controlling elements and the gene. Cold Spring Harbor Symp. Quant. Biol. 1956; 21:197–216. - PubMed
    1. Chuong E.B., Elde N.C., Feschotte C.. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 2017; 18:71–86. - PMC - PubMed
    1. Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W.et al. .. Initial sequencing and analysis of the human genome. Nature. 2001; 409:860–921. - PubMed
    1. Lanciano S., Cristofari G.. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 2020; 21:721–736. - PubMed
    1. Kapusta A., Kronenberg Z., Lynch V.J., Zhuo X., Ramsay L., Bourque G., Yandell M., Feschotte C.. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013; 9:e1003470. - PMC - PubMed

Substances