Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 12;52(14):8112-8126.
doi: 10.1093/nar/gkae571.

Evidence for widespread translation of 5' untranslated regions

Affiliations

Evidence for widespread translation of 5' untranslated regions

Jose Manuel Rodriguez et al. Nucleic Acids Res. .

Abstract

Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Workflow describing the process of discovering 5′ translated upstream regions.
Figure 2.
Figure 2.
Alignment of orthologous upstream regions for C1Q4L and SLC7A5. (A) The translated upstream region in C1Q4L. The orthologous sequences are from eutherian mammals. Alignment and colouring adapted from the CodAlignView server and based on the Cactus 241-way mammalian alignments. Synonymous base changes are shown with a light blue background, non-synonymous changes that would result in conservative amino acid substitutions are shown with a dark blue background, and non-synonymous changes that would produce conservative substitutions are shown with a yellow background. Frameshifts are highlighted in orange. Stop codons are highlighted in red. The annotated downstream ATG is shown with a green background. The detected peptide is shown above the alignment in red font. Potential start codons mentioned in the text are highlighted with a purple box. Synonymous changes greatly outnumber non-synonymous changes suggesting that this region is under strong selective pressure. (B) The translated upstream region in SLC7A5. Alignment and colouring as for C1Q4L. The orthologous sequences are from primates only. Most aligned species have frameshifts or a stop codon. The CTG is only conserved in human.
Figure 3.
Figure 3.
Upstream region types and their cross-species conservation. (A) A graphical representation of three types of translated upstream region. Coding exons are represented by thicker boxes, annotated 5′ exons by narrower boxes. The regions of 5′ UTR differently coloured from the background show the differences between the three types of translated regions. 5′ extensions start upstream of the coding exon and run into the coding exon in the same frame, so would generate a protein with a longer N-terminal. uoORFs begin upstream of the coding exon and invade into the coding exon in a different frame. They continue until they reach a stop codon and would produce an entirely different protein. uORFs also begin upstream of the coding exon and would produce a different protein, but they reach a stop codon before the canonical ATG. (B) The cross-species conservation of the 192 translated upstream regions separated into six bins. Chimp/gorilla includes all regions only conserved in chimpanzee, gorilla or both. Mammals includes all translated upstream regions that are conserved across mammals at least, though at least sixteen have more ancient origins.
Figure 4.
Figure 4.
Conserved upstream regions extend functional domains. (A) The translated upstream region in CCDC8. The orthologous sequences are from eutherian mammals. The alignment and colouring adapted from the CodAlignView server and based on the Cactus 241-way mammalian alignments. Synonymous base changes are shown with a light blue background, non-synonymous changes that would result in conservative amino acid substitutions are shown with a dark blue background, and non-synonymous changes that would produce conservative substitutions are shown with a yellow background. The annotated downstream ATG is shown with a green background. The detected peptide is shown above the alignment in red font. The start codon is highlighted with a purple box. (B) The Alphafold (59) model for coiled coil domain containing 8 from Iberian lynx downloaded from UniProt (A0A485NL47) with the novel human N-terminal sequence painted onto the structure. The novel region coded by the translated upstream region (in yellow) completes a PNMA N-terminal RRM-like domain. (C) The Alphafold model for Helicase with zinc finger 2 (from gene HELZ2) from Pallas’ mastiff bat downloaded from UniProt (A0A7J8HGE4) with the novel human N-terminal sequence painted onto the structure. The novel region coded by the translated upstream region (in yellow) completes a globular structural domain.
Figure 5.
Figure 5.
Translated upstream region characteristics. (A) Distribution of the GC-content of the exons containing the translated upstream regions (blue) versus principal 5′ UTR (yellow) and principal coding transcripts (green). (B) The proportion of PSM in three types of proteomic experiments, tissues, cell lines and biopsies for genes with upstream translations. PSM have been divided into four groups by age of translated upstream regions, either non-conserved (<50 MYA) or conserved at least in Strepsirrhini (>75 MYA), and by whether the peptide mapped to the annotated gene (Canonical) or to the upstream region (TUR). (C) Protein expression and Kozak strength. Plots the percentage of the PSMs for genes that map to the N-terminal extensions versus the strength of the Kozak motif for four different start codons (ACG, ATG, CTG and GTG). (D) Non-synonymous to synonymous ratios (NS/Syn) for rare and common alleles for all translated upstream regions, for recently evolved translated upstream regions (<50 million years) and for translated upstream regions conserved at least in Strepsirrhini (>75 MYA).
Figure 6.
Figure 6.
Translated upstream regions in GluN2 genes, in NHSL1 and in WWC3. (A) uoORFs in GRIN2A (yellow) and GRIN2B (blue). For each gene there are two sets of exons, the upper set (darker shade) shows the potential coding region of the uoORF, the lower set, the first coding exon of the principal transcript for each gene. Potential coding exons are shown as wide boxes, non-coding exons are more narrow boxes, and introns are black lines. Exons and introns are not to scale. Potential ATG codons are shown as green bars, conserved stop codons as red bars. The frame of each coding ORF is shown (compared to the principal coding exon, which is frame 0). The dark green blocks show where the translation of the LIVBP-like domain (PBP1_iGluR_NMDA_NR2) would start in the principal transcript. The red horizontal line indicates the position of the peptide detected for the GRIN2A uoORF. (B) The resolved cryo-EM structure of the WAVE regulatory complex (PDB: 7usc, (64)) with the sequence of the NHSL1 N-terminal extension mapped onto the homologous WASF1 protein. The WASF1 protein is in blue and yellow, dark blue where it is homologous to the sequence of the NHSL1 isoform, yellow where it is similar to the NHSL1 translated upstream region, and light blue where there was no detectable homology. Homology determined with the HHPRED server (65). The other visible proteins in the complex are the CYFIP1 protein (light green), the BRK1 protein (orange) and the ABI2 protein (teal). (C) The Alphafold model for N-terminus of the complete WWC3 protein downloaded from UniProt (T2C6S4) with the novel human N-terminal sequence painted onto the structure. The novel region coded by the translated upstream region (in yellow) completes a WW domain. (D) The upstream exons of WWC3 (not to scale), with the positions of the upstream and downstream ATGs and the two-base gap marked. Peptides detected for the upstream region mentioned in the text are shown above the exons. Peptides found in our analysis are in red, gap-spanning peptides found in PeptideAtlas in blue.

Similar articles

References

    1. Nurk S., Koren S., Rhie A., Rautiainen M., Bzikadze A.V., Mikheenko A., Vollger M.R., Altemose N., Uralsky L., Gershman A.et al. .. The complete sequence of a human genome. Science. 2022; 376:44–53. - PMC - PubMed
    1. Rhie A., Nurk S., Cechova M., Hoyt S.J., Taylor D.J., Altemose N., Hook P.W., Koren S., Rautiainen M., Alexandrov M.et al. .. The complete sequence of a human Y chromosome. Nature. 2023; 621:344–354. - PMC - PubMed
    1. Cerdán-Vélez D., Tress M.L.. The T2T-CHM13 reference assembly uncovers essential WASH1 and GPRIN2 paralogues. Bioinform. Adv. 2024; 4:vbae029. - PMC - PubMed
    1. Frankish A., Carbonell-Sala S., Diekhans M., Jungreis I., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Arnan C., Barnes I.et al. .. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023; 51:D942–D949. - PMC - PubMed
    1. Martin F.J., Amode M.R., Aneja A., Austine-Orimoloye O., Azov A.G., Barnes I., Becker A., Bennett R., Berry A., Bhai J.et al. .. Ensembl 2023. Nucleic Acids Res. 2023; 51:D933–D941. - PMC - PubMed

LinkOut - more resources