Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Mar 12;458(7235):223-7.
doi: 10.1038/nature07672. Epub 2009 Feb 1.

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals

Affiliations

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals

Mitchell Guttman et al. Nature. .

Abstract

There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified approximately 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFkappaB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Intergenic K4–K36 domains produce multi-exonic RNAs
a, Example of an intergenic K4–K36 domain and the K4–K36 domains of two flanking protein-coding genes. Each histone modification is plotted as the number of DNA fragments obtained by ChIP-Seq at each position. Black boxes indicate known protein-coding regions and grey boxes are intergenic K4–K36 domains. Arrowheads indicate the orientation of transcription. b, Intergenic K4–K36 domains were interrogated for presence of transcription by hybridizing RNA to DNA tiling arrays. The RNA hybridization intensity is plotted in red. RNA peaks were determined and are represented by grey boxes. The presence of a spliced transcript was validated by hybridization to a northern blot (right). c, Connectivity between the inferred exons was validated by PCR with reverse transcription (RT–PCR). Right top shows RT–PCR validation of each exon, right bottom shows RT–PCR across each consecutive exon.
Figure 2
Figure 2. lincRNA K4–K36 domains do not encode proteins and are conserved in their exons and promoters
a, Density plot of the maximum CSF score (Methods) across intergenic K4–K36 domains (grey) and known protein-coding genes (black). The maximum CSF scores for known lincRNAs are indicated as black points at the bottom. b, Cumulative distribution of sequence conservation across mammals for lincRNA exons (blue), protein-coding exons (green), introns (red) and known non-coding RNA exons (grey). c, Cumulative distribution of sequence conservation for lincRNA promoters (blue), random intergenic regions (red), and protein-coding promoters (green). LOD, logarithm of the odds ratio; Pi is the conservation metric (see Supplementary Methods). d, Enrichment of various promoter features plotted as the distance from the start of the K36me3 region averaged across all lincRNAs. Enrichment in each cell type of K4me3 domains across mouse ESCs (red), MEF (black), MLF (blue) and NPC (green) is shown (top panel). Enrichment of 5′ CAGE-tag density representing the 5′ end of RNA molecules (middle panel) and conservation scores in the K4me3 region are shown (bottom panel).
Figure 3
Figure 3. lincRNAs show strong associations with other lincRNAs and with several biological processes
a, Association matrix of lincRNA and functional gene sets. Functional gene sets (columns) and lincRNAs (rows) are shown as positively (red), negatively (blue) or not associated (white) with lincRNA expression profiles. The black boxes highlight two significant biclusters in the matrix. b, Gene ontology of the protein-coding genes in these clusters is shown and plotted as the −log(P value) for the enrichment of each Gene Ontology term. c, Map of mouse genomic locus (Hoxc) containing HOTAIR. HOTAIR (red) and Frigidair (blue) show diametrically opposed expression patterns between mouse forelimb (anterior) and mouse hindlimb (posterior). d, Map of genomic locus containing COX2 along with the location of lincRNA-COX2. Quantitative RT–PCR shows that lincRNA-COX2 is upregulated in TLR4-stimulated cells (blue) but not TLR3-stimulated cells (grey). e, A map of the genomic locus containing Sox2 shows a lincRNA ~50 kb upstream that is expressed specifically in ESCs. f, K36me3 enrichment across four cell types for lincRNAs bound by Oct4 or Nanog (left). Red indicates high enrichment, blue denotes low enrichment. The lincRNA-Sox2 promoter was cloned into a luciferase reporter construct and assayed for transcriptional activity with Sox2 and Oct4 alone, together and controls (right). The y-axis represents the transcriptional activity of this promoter relative to a Renilla construct. Error bars are ± s.d. of three replicate transfections.

Comment in

  • Missing lincs in the transcriptome.
    Gingeras T. Gingeras T. Nat Biotechnol. 2009 Apr;27(4):346-7. doi: 10.1038/nbt0409-346. Nat Biotechnol. 2009. PMID: 19352372 No abstract available.

Similar articles

Cited by

References

    1. Bertone P, et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004;306:2242–2246. - PubMed
    1. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. - PubMed
    1. Kapranov P, et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002;296:916–919. - PubMed
    1. Rinn JL, et al. The transcriptional activity of human chromosome 22. Genes Dev. 2003;17:529–540. - PMC - PubMed
    1. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007;17:556–565. - PMC - PubMed

Publication types

Associated data