Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar;14(1):101-112.
doi: 10.1007/s12539-021-00464-1. Epub 2021 Jul 25.

LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning

Affiliations

LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning

Siyuan Zhao et al. Interdiscip Sci. 2022 Mar.

Abstract

Long non-coding RNA (lncRNA), which is a type of non-coding RNA, was reported to contain short open reading frames (sORFs). SORFs-encoded short peptides (SEPs) have been demonstrated to play a crucial role in regulating the biological processes such as growth, development, and resistance response. The identification of SEPs is vital to further understanding their function. However, there is still a lack of methods for identifying SEPs effectively and rapidly. In this study, a novel method for lncRNA-encoded short peptides identification based on feature subset recombination and ensemble learning, lncPepid, is developed. lncPepid transforms the data of Zea mays and Arabidopsis thaliana into hybrid features from two aspects including sequence composition and physicochemical properties separately. It optimizes hybrid features by proposing a novel weighted iteration-based feature selection method to recombine a stable subset that characterizes SEPs effectively. Different classification models with different optimized features are constructed and tested separately. The outputs of the optimal models are integrated for ensemble classification to improve efficiency. Experimental results manifest that the geometric mean of sensitivity and specificity of lncPepid is about 70% on the identification of functional SEPs derived from multiple species. It is an effective and rapid method for the identification of lncRNA-encoded short peptides. This study can be extended to the research on SEPs from other species and have crucial implications for further findings and studies of functional genomics.

Keywords: Ensemble learning; Feature subset recombination; Long non-coding RNA; Short open reading frames; Short peptides.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Nelson BR, Makarewich CA, Anderson DM, Winders BR, Troupes CD, Wu F et al (2016) A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351(6270):271–275. https://doi.org/10.1126/science.aad4076 - DOI - PubMed - PMC
    1. Stein CS, Jadiya P, Zhang X, McLendon JM, Abouassaly GM, Witmer NH et al (2018) Mitoregulin: a lncRNA-encoded microprotein that supports mitochondrial supercomplexes and respiratory efficiency. Cell Rep 23(13):3710–3720. https://doi.org/10.1016/j.celrep.2018.06.002 - DOI - PubMed - PMC
    1. Huang JZ, Chen M, Chen D, Gao XC, Zhu S, Huang H et al (2017) A peptide encoded by a putative lncRNA HOXB-AS3 suppresses colon cancer growth. Mol Cell 68(1):171–184. https://doi.org/10.1016/j.molcel.2017.09.015 - DOI - PubMed
    1. Röhrig H, Schmidt J, Miklashevichs E, Schell J, John M (2002) Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc Natl Acad Sci USA 99(4):1915–1920. https://doi.org/10.1073/pnas.022664799 - DOI - PubMed - PMC
    1. Ma J, Yan B, Qu Y, Qin F, Yang Y, Hao X et al (2008) Zm401, a short-open reading-frame mRNA or noncoding RNA, is essential for tapetum and microspore development and can regulate the floret formation in maize. J Cell Biochem 105(1):136–146. https://doi.org/10.1002/jcb.21807 - DOI - PubMed

MeSH terms

LinkOut - more resources