Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 3;16(7):evae126.
doi: 10.1093/gbe/evae126.

High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population

Affiliations

High Polymorphism Levels of De Novo ORFs in a Yoruba Human Population

Covadonga Vara et al. Genome Biol Evol. .

Abstract

During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.

Keywords: LCL; de novo ORF; human; polymorphism; translation.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Workflow overview and features of de novo ORFs when compared to canonical ORFs. a) Diagram showing the overall workflow for the data analysis. b) Diagram showing a canonical ORF in the genome (line) and the corresponding localization of potential de novo uORFs, dORFs, and lncORFs. c) Natural logarithm of the length distribution in nucleotides of canonical ORFs (n = 18,635, Q1 = 6.72, median = 7.20, Q3 = 7.67) and de novo ORFs (n = 2,587, Q1 = 4.33, median = 4.84, Q3 = 5.83). ***P < 0.001 (Wilcoxon test). d) Translation levels in TPM on average per canonical ORF (n = 18,635, Q1 = 0.0, median = 1.42, Q3 = 3.36) and de novo ORF (n = 2,587, Q1 = 0, median = 0.64, Q3 = 2.19). ***P < 0.001 (Wilcoxon test). Scale is log1p [ln(x + 1)]. e) IP distribution of expressed canonical ORFs (n = 14,921, Q1 = 5.79, median = 6.90, Q3 = 8.66) and expressed de novo ORFs (n = 2,002, Q1 = 6.41, median = 9.03, Q3 = 10.83). ***P < 0.001 (Wilcoxon test).
Fig. 2.
Fig. 2.
Features of de novo ORFs based on their evolutionary age and biotype. a) Translation levels in TPM on average per de novo ORF of different evolutionary ages in the primate lineage (younger to older): human (n = 96, Q1 = 0.0, median = 0.52, Q3 = 1.97), old-world monkeys (n = 559, Q1 = 0.0, median = 0.33, Q3 = 1.85), and Primatomorpha (n = 1,932, Q1 = 0.0, median = 0.76, Q3 = 2.89). NS, not significant, P > 0.05, *P = 0.01 (pairwise Dunn test). Scale is log1p [ln(x + 1)]. b) Translation levels in TPM on average per de novo ORF based on their biotype: uORFs (n = 1,479, Q1 = 0.0, median = 1.26, Q3 = 2.71), dORFs (n = 313, Q1 = 0.0, median = 0.96, Q3 = 1.98), and lncORF (n = 795, Q1 = 0.0, median = 0.002, Q3 = 1.1). ***P < 0.001 (pairwise Dunn test). Scale is log1p [ln(x + 1)]. c) Proportion of de novo uORF, dORF, and lncORF in different evolutionary ages in the primate lineage (younger to older): human, old-world monkeys, and Primatomorpha. d) Natural logarithm of the median lengths of de novo uORF (n = 1,479, Q1 = 4.25, median = 4.64, Q3 = 5.50), dORF (n = 313, Q1 = 4.41, median = 4.89, Q3 = 5.37), and lncORF (n = 795, Q1 = 4.61, median = 5.25, Q3 = 7.24). NS, not significant, ***P < 0.001 (pairwise Dunn test). Only data for ORFs translated at significant levels in at least one LCL are shown.
Fig. 3.
Fig. 3.
Proportion of individuals in which the ORF is translated at significant levels. a) Bar plot showing the percentage of LCLs in which de novo ORFs and canonical ORFs were detected. The majority of canonical ORFs are either translated in all individuals or in none, whereas de novo ORFs show a much more variable pattern. b) Plot showing the translation values of the ORFs together with the level of “sharing” or the number of individuals translating the ORF. Comparison between canonical ORFs and de novo ORFs. ***P < 0.001 (Wilcoxon test). c) Proportion of individuals translating de novo ORFs depending on their evolutionary age (older to younger: Primatomorpha, old-world monkeys, human). d) Number of individuals translating de novo ORFs depending on the location of the ORF. Only data for ORFs translated at significant levels in at least one LCL are shown.
Fig. 4.
Fig. 4.
Polymorphism levels of subsampled canonical ORFs. a) Comparison of the translation levels of subsampled canonical ORFs and de novo ORFs. The subsampled canonical ORFs correspond to canonical ORFs with the same translation level distribution as de novo ORFs. NS, not significant (Wilcoxon test). b) Bar plot representing the proportion of individuals in which the ORF is translated at significant levels. Comparison between subsampled canonical ORFs and de novo ORFs. c) Plot showing the translation values of the ORFs together with the level of sharing or number of individuals translating the ORF. Comparison between subsampled canonical ORFs and de novo ORFs. NS, no significant (Wilcoxon test).
Fig. 5.
Fig. 5.
Translation signatures of a de novo ORF and a canonical ORF in four human LCLs. a) Ribo-Seq coverage tracks are shown for four samples included in the data set from Battle et al. (2015), for the lncRNA MIR4458HG and the gene ANKRD33B. b) Amino acid sequence encoded by the de novo ORF in MIR4458HG with the amino acid type color-coded (yellow, nonpolar; green, positive; purple, negative).

Similar articles

References

    1. Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ, Pritchard JK, Gilad Y. Impact of regulatory variation from RNA to protein. Science. 2015:347(6222):664–667. 10.1126/science.1260793. - DOI - PMC - PubMed
    1. Blevins WR, Ruiz-Orera J, Messeguer X, Blasco-Moreno B, Villanueva-Cañas JL, Espinar L, Díez J, Carey LB, Albà MM. Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun. 2021:12(1):604. 10.1038/s41467-021-20911-3. - DOI - PMC - PubMed
    1. Broeils LA, Ruiz-Orera J, Snel B, Hubner N, van Heesch S. Evolution and implications of de novo genes in humans. Nat Ecol Evol. 2023:7(6):804–815. 10.1038/s41559-023-02014-y. - DOI - PubMed
    1. Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, et al. . Proto-genes and de novo gene birth. Nature. 2012:487(7407):370–374. 10.1038/nature11184. - DOI - PMC - PubMed
    1. Charif D, Lobry JR. Seqinr 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, editors. Structural approaches to sequence evolution: molecules, networks, populations. New York: Biological and Medical Physics, Biomedical Engineering Springer Verlag; 2007. p. 207–232.

Publication types