Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 May 19;33(9):2838-51.
doi: 10.1093/nar/gki583. Print 2005.

Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE

Affiliations

Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE

Zhihong Zhang et al. Nucleic Acids Res. .

Abstract

A minimally addressed area in Saccharomyces cerevisiae research is the mapping of transcription start sites (TSS). Mapping of TSS in S.cerevisiae has the potential to contribute to our understanding of gene regulation, transcription, mRNA stability and aspects of RNA biology. Here, we use 5' SAGE to map 5' TSS in S.cerevisiae. Tags identifying the first 15-17 bases of the transcripts are created, ligated to form ditags, amplified, concatemerized and ligated into a vector to create a library. Each clone sequenced from this library identifies 10-20 TSS. We have identified 13,746 unique, unambiguous sequence tags from 2231 S.cerevisiae genes. TSS identified in this study are consistent with published results, with primer extension results described here, and are consistent with expectations based on previous work on transcription initiation. We have aligned the sequence flanking 4637 TSS to identify the consensus sequence A(A(rich))5NPyA(A/T)NN(A(rich))6, which confirms and expands the previous reported PyA(A/T)Pu consensus pattern. The TSS data allowed the identification of a previously unrecognized gene, uncovered errors in previous annotation, and identified potential regulatory RNAs and upstream open reading frames in 5'-untranslated region.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme of 5′ SAGE methodology. The poly(A)-rich RNA is divided into two pools. Different oligos set (blue or red) are used to carry out reverse transcription, template-switching and primer extension. After tagging enzyme (yellow oval) digestion, the two sample pools are combined together to make 120 bp ditags. The anchoring enzyme (blue triangle) is used to generate 50 bp ditag for concatenation. Specifics are discussed in Materials and Methods.
Figure 2
Figure 2
5′ SAGE tag distribution around the ORF start codon. (A) Distribution of all 8194 unitags from 500 bp upstream of ATG to 1000 bp downstream of ATG. (B) Distribution of all 1165 multiple occurrence unitags from 500 bp upstream of ATG to 1000 bp downstream of ATG. (C) Zoom-in view of tag distribution shown in (B) within 200 bp putative 5′-UTR region. (D) Cumulative distribution function (CDF) plots of all unitags (red) and multiple occurrence unitags (blue).
Figure 3
Figure 3
Correlation between tag occurrence and ORF length/gene expression level. Each 5′-UTR unitag occurrence was plotted with corresponding gene (A) ORF length, and (B) expression level (mRNA copies/cell). The expression level data is acquired from website based on microarray data (24) and the ORF length was calculated based on SGD annotation (20). A linear regression model was applied (R2 = 0.3726, P < 2.2 × 10−16). Negative (−0.6320, P = 1.67 × 10−6) and positive (0.3804, P < 2 × 10−16) correlation coefficients were observed from the model for ORF length (bp) and expression level (mRNA copies/cell), respectively.
Figure 4
Figure 4
Primer extension verification. Primer extension was used to map the TSS of 12 S.cerevisiae genes. For each gene, a gene-specific 32P-end-labeled primer was used to reverse transcribe to the 5′ end of the respective mRNA. Fragment sizes were analyzed by denaturing PAGE and autoradiograph. Lane M: Φ174 Hinf I DNA markers; lane 1: primer extension reaction; lane 2: reaction without RNA (negative control). (a) the marker actual size (nt); (b): the corresponding position (bp) to the ATG start codon; (c) the assigned 5′ SAGE tag position (bp) with occurrence in parenthesis, some are assigned to a single band because of the gel resolution; (d) The number in the bracket means the position (bp) estimation of apparent band without 5′ SAGE data.
Figure 5
Figure 5
The consensus sequence of the TSS The sequence of ±10 bp flanking each TSS was extracted from the S.cerevisiae genomic sequence and analyzed using WebLOGO () (38,39). Sequence LOGO of TSS flanking sequences derived from (A) all 4936 unitags mapping to the putative 5′-UTR region, (B) 1041 multiple occurrence unitags mapping to the putative 5′-UTR and (C) 3258 unitags mapping to the coding region and putative 3′-UTR (negative control).
Figure 6
Figure 6
New features predicted in S.cerevisiae insights from the TSS information from 5′ SAGE data combined with comparative genomics methods have versatile usages include: (A) New gene discovery: synteny view of S.cerevisiae chromosomal IV and III regions, which are believed to be duplicated regions resulting from the whole genome duplication. Each orthologous gene pair is shown in the same color. Two 5′ SAGE unitags with total three occurrences (yellow arrow) revealed a new gene, YCL048W-A, which is homologous to YDR524C-B. (B) Determine the real ATG start codon: Two unitags with one having multiple occurrences are mapped to the coding region of LSM6, while no tag is associated to its 5′-UTR. Protein sequence alignment to orthologs from other Saccharomyces species further supports the proposed LSM6 translation start position. (C) Search of putative regulatory RNA element similar to SRG1SER3. Two multiple occurrence tags upstream of ODC2 coding region are shown (yellow arrow). The phylogenic comparison showed homology around these two TSS position among multiple species. There is also a conventional SAGE tag (green arrow) that maps to this region with position −286. (D) Example of uORF containing gene. Four unitags with one having multiple occurrences were mapped to 300+ bp upstream of PCL5 coding region. Two small uORFs (blue and red) are found and conserved among all four sensu stricto species in terms of position, length and sequences. Five other hemiascomycete species also contain similar putative uORF(s) in that region. In C.glabrata, three uORFs are present, with two overlapping in different reading frames.

Similar articles

Cited by

References

    1. Dietrich F.S., Voegeli S., Brachat S., Lerch A., Gates K., Steiner S., Mohr C., Pohlmann R., Luedi P., Choi S., et al. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science. 2004;304:304–307. - PubMed
    1. Kellis M., Birren B.W., Lander E.S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–624. - PubMed
    1. Cliften P., Sudarsanam P., Desikan A., Fulton L., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science. 2003;301:71–76. - PubMed
    1. Sherman D., Durrens P., Beyne E., Nikolski M., Souciet J.L. Genolevures: comparative genomics and molecular evolution of hemiascomycetous yeasts. Nucleic Acids Res. 2004;32:D315–D318. - PMC - PubMed
    1. Zhang Z., Dietrich F.S. Verification of a new gene on Saccharomyces cerevisiae chromosome III. Yeast. 2003;20:731–738. - PubMed

Publication types