Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs

A Corrigendum to this article was published on 01 July 2010

This article has been updated

Abstract

Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5′ start sites, 3′ ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Scripture: a method for ab initio transcriptome reconstruction from RNA-Seq data.
Figure 2: Scripture correctly reconstructs full-length transcripts for most annotated protein coding genes.
Figure 3: Alternative 5′ ends, 3′ ends and novel coding exons in transcripts reconstructed by Scripture.
Figure 4: Noncoding transcripts reconstructed by Scripture.
Figure 5: Protein coding capacity, conservation levels and expression of lincRNAs and multi-exonic antisense transcripts.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

Change history

  • 09 July 2010

    In the version of this article initially published, the fourth sentence in the methods section “RNA extraction and library preparation” instead of saying a “procedure that combines a random priming step with a shearing step8,9,28 and results in fragments of ~700 bp in size” should have read, “procedure that combines fragmentation of mRNA to a peak size of ~750 nucleotides by heating6 followed by random-primed reverse transcription8.”. The error has been corrected in the HTML and PDF versions of the article.

References

  1. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    Article  CAS  Google Scholar 

  2. Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

    Article  CAS  Google Scholar 

  3. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

    Article  CAS  Google Scholar 

  4. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    Article  CAS  Google Scholar 

  5. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).

    Article  CAS  Google Scholar 

  6. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

    Article  CAS  Google Scholar 

  7. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    Article  CAS  Google Scholar 

  8. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  9. Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).

    Article  CAS  Google Scholar 

  10. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

    Article  CAS  Google Scholar 

  11. Maher, C.A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

    Article  CAS  Google Scholar 

  12. Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009).

    Article  CAS  Google Scholar 

  13. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  Google Scholar 

  14. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).

    Article  Google Scholar 

  15. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  Google Scholar 

  16. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    Article  CAS  Google Scholar 

  17. Lin, M.F., Deoras, A.N., Rasmussen, M.D. & Kellis, M. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLOS Comput. Biol. 4, e1000067 (2008).

    Article  Google Scholar 

  18. Lin, M.F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007).

    Article  CAS  Google Scholar 

  19. Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

    Article  CAS  Google Scholar 

  20. Brown, C.J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991).

    Article  CAS  Google Scholar 

  21. Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

    Article  CAS  Google Scholar 

  22. Willingham, A.T. et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005).

    Article  CAS  Google Scholar 

  23. Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J. & Lee, J.T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).

    Article  CAS  Google Scholar 

  24. Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

    Article  Google Scholar 

  25. Wu, J. Q. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259 (2010).

    Article  CAS  Google Scholar 

  26. Ramsköld, D., Wang, E.T., Burge, C.B. & Sandberg, R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLOS Comput. Biol. 5, e1000598 (2009).

    Article  Google Scholar 

  27. Conti, L. et al. Niche-independent symmetrical self-renewal of a mammalian tissue stem cell. PLoS Biol. 3, e283 (2005).

    Article  Google Scholar 

  28. Berger, M. F. et al. Integrative analysis of the melanoma transcriptome. Genome Res. 20, 413–427 (2010).

    Article  CAS  Google Scholar 

  29. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).

    Article  CAS  Google Scholar 

  30. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  Google Scholar 

  31. Ewens, W.J. & Grant, G.R. Statistical Methods in Bioinformatics: An Introduction 2nd edn. (Springer, 2005).

  32. Glaz, J., Naus, J.I. & Wallenstein, S. Scan Statistics (Springer, 2001).

Download references

Acknowledgements

We thank M. Wernig (MIT) for providing NPC; M. Lin and M. Kellis (MIT) for CSF code; the Broad Sequencing Platform for sample sequencing; L. Gaffney for assistance with graphics; and C. Burge, J. Merkin, R. Bradley and members of Lander and Regev laboratories—in particular, M. Yassour, T. Mikkelsen and I. Amit—for discussions. A.R. and J.L.R. were supported by the Merkin Family Foundation for Stem Cell Research at the Broad Institute. M. Guttman was supported by a Vertex scholarship. Work was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interface, a US National Institutes of Health PIONEER award, a US National Human Genome Research Institute (NHGRI) R01 grant and the Howard Hughes Medical Institute (A.R.), and NHGRI and the Broad Institute of MIT and Harvard (E.S.L.).

Author information

Authors and Affiliations

Authors

Contributions

M. Guttman and M. Garber conceived the project, designed research, implemented Scripture, performed computational analysis and wrote the paper. A.G., C.N. and J.Z.L. oversaw cDNA sequencing, provided molecular biology advice and helped to edit the manuscript. J.D. constructed cDNA libraries, performed validation experiments and helped to edit the manuscript. J.R. implemented components of Scripture and provided computational support and technical advice. X.A., L.F. and M.J.K. constructed cDNA libraries. J.L.R. provided reagents and helped edit the manuscript. E.S.L. designed research direction and wrote the paper. A.R. provided cDNA sequencing guidance, conceived the project, designed research direction and wrote the paper.

Corresponding authors

Correspondence to Mitchell Guttman, Manuel Garber or Aviv Regev.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1 and 2, Supplementary Figures 1–7 (PDF 3117 kb)

Supplementary Table 1

Number of novel transcriptional events in ES, MLF and NPC (XLS 10 kb)

Supplementary Table 2

Primer sequences used for validation of novel events (XLS 13 kb)

Supplementary Software

scripture.jar scripture.src.tgz (ZIP 15384 kb)

Supplementary Data

ES.gff.gz ESTranscriptGraphs.tar.gz (ZIP 37695 kb)

Supplementary Data

MLF.gff.gz MLFTranscriptGraphs.tar.gz (ZIP 14834 kb)

Supplementary Data

NPC.gff.gz NPCTranscriptGraphs.tar.gz (ZIP 42407 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guttman, M., Garber, M., Levin, J. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28, 503–510 (2010). https://doi.org/10.1038/nbt.1633

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1633

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing