LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning
- PMID: 34304369
- DOI: 10.1007/s12539-021-00464-1
LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning
Abstract
Long non-coding RNA (lncRNA), which is a type of non-coding RNA, was reported to contain short open reading frames (sORFs). SORFs-encoded short peptides (SEPs) have been demonstrated to play a crucial role in regulating the biological processes such as growth, development, and resistance response. The identification of SEPs is vital to further understanding their function. However, there is still a lack of methods for identifying SEPs effectively and rapidly. In this study, a novel method for lncRNA-encoded short peptides identification based on feature subset recombination and ensemble learning, lncPepid, is developed. lncPepid transforms the data of Zea mays and Arabidopsis thaliana into hybrid features from two aspects including sequence composition and physicochemical properties separately. It optimizes hybrid features by proposing a novel weighted iteration-based feature selection method to recombine a stable subset that characterizes SEPs effectively. Different classification models with different optimized features are constructed and tested separately. The outputs of the optimal models are integrated for ensemble classification to improve efficiency. Experimental results manifest that the geometric mean of sensitivity and specificity of lncPepid is about 70% on the identification of functional SEPs derived from multiple species. It is an effective and rapid method for the identification of lncRNA-encoded short peptides. This study can be extended to the research on SEPs from other species and have crucial implications for further findings and studies of functional genomics.
Keywords: Ensemble learning; Feature subset recombination; Long non-coding RNA; Short open reading frames; Short peptides.
© 2021. International Association of Scientists in the Interdisciplinary Areas.
Similar articles
-
Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning.IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2873-2881. doi: 10.1109/TCBB.2021.3104288. Epub 2022 Oct 10. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 34383651
-
csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.Brief Bioinform. 2022 Nov 19;23(6):bbac392. doi: 10.1093/bib/bbac392. Brief Bioinform. 2022. PMID: 36094083 Free PMC article.
-
Identification of small open reading frames in plant lncRNA using class-imbalance learning.Comput Biol Med. 2023 May;157:106773. doi: 10.1016/j.compbiomed.2023.106773. Epub 2023 Mar 11. Comput Biol Med. 2023. PMID: 36924731
-
Mining for missed sORF-encoded peptides.Expert Rev Proteomics. 2019 Mar;16(3):257-266. doi: 10.1080/14789450.2019.1571919. Epub 2019 Feb 13. Expert Rev Proteomics. 2019. PMID: 30669886 Review.
-
Identification and characterization of sORF-encoded polypeptides.Crit Rev Biochem Mol Biol. 2015 Mar-Apr;50(2):134-41. doi: 10.3109/10409238.2015.1016215. Epub 2015 Apr 10. Crit Rev Biochem Mol Biol. 2015. PMID: 25857697 Free PMC article. Review.
Cited by
-
Novel Long Non-Coding RNA (lncRNA) Transcript AL137782.1 Promotes the Migration of Normal Lung Epithelial Cells through Positively Regulating LMO7.Int J Mol Sci. 2023 Sep 9;24(18):13904. doi: 10.3390/ijms241813904. Int J Mol Sci. 2023. PMID: 37762205 Free PMC article.
-
Biological Activity of Artificial Plant Peptides Corresponding to the Translational Products of Small ORFs in Primary miRNAs and Other Long "Non-Coding" RNAs.Plants (Basel). 2024 Apr 18;13(8):1137. doi: 10.3390/plants13081137. Plants (Basel). 2024. PMID: 38674546 Free PMC article. Review.
-
Peptidomics Methods Applied to the Study of Flower Development.Methods Mol Biol. 2023;2686:509-536. doi: 10.1007/978-1-0716-3299-4_24. Methods Mol Biol. 2023. PMID: 37540375
-
Research progress on the roles of lncRNAs in plant development and stress responses.Front Plant Sci. 2023 Mar 7;14:1138901. doi: 10.3389/fpls.2023.1138901. eCollection 2023. Front Plant Sci. 2023. PMID: 36959944 Free PMC article. Review.
References
-
- Nelson BR, Makarewich CA, Anderson DM, Winders BR, Troupes CD, Wu F et al (2016) A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351(6270):271–275. https://doi.org/10.1126/science.aad4076 - DOI - PubMed - PMC
-
- Stein CS, Jadiya P, Zhang X, McLendon JM, Abouassaly GM, Witmer NH et al (2018) Mitoregulin: a lncRNA-encoded microprotein that supports mitochondrial supercomplexes and respiratory efficiency. Cell Rep 23(13):3710–3720. https://doi.org/10.1016/j.celrep.2018.06.002 - DOI - PubMed - PMC
-
- Huang JZ, Chen M, Chen D, Gao XC, Zhu S, Huang H et al (2017) A peptide encoded by a putative lncRNA HOXB-AS3 suppresses colon cancer growth. Mol Cell 68(1):171–184. https://doi.org/10.1016/j.molcel.2017.09.015 - DOI - PubMed
-
- Röhrig H, Schmidt J, Miklashevichs E, Schell J, John M (2002) Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc Natl Acad Sci USA 99(4):1915–1920. https://doi.org/10.1073/pnas.022664799 - DOI - PubMed - PMC
-
- Ma J, Yan B, Qu Y, Qin F, Yang Y, Hao X et al (2008) Zm401, a short-open reading-frame mRNA or noncoding RNA, is essential for tapetum and microspore development and can regulate the floret formation in maize. J Cell Biochem 105(1):136–146. https://doi.org/10.1002/jcb.21807 - DOI - PubMed
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources