Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 12:13:1015140.
doi: 10.3389/fmicb.2022.1015140. eCollection 2022.

Mirror proteases of Ac-Trypsin and Ac-LysargiNase precisely improve novel event identifications in Mycolicibacterium smegmatis MC2 155 by proteogenomic analysis

Affiliations

Mirror proteases of Ac-Trypsin and Ac-LysargiNase precisely improve novel event identifications in Mycolicibacterium smegmatis MC2 155 by proteogenomic analysis

Songhao Jiang et al. Front Microbiol. .

Abstract

Accurate identification of novel peptides remains challenging because of the lack of evaluation criteria in large-scale proteogenomic studies. Mirror proteases of trypsin and lysargiNase can generate complementary b/y ion series, providing the opportunity to efficiently assess authentic novel peptides in experiments other than filter potential targets by different false discovery rates (FDRs) ranking. In this study, a pair of in-house developed acetylated mirror proteases, Ac-Trypsin and Ac-LysargiNase, were used in Mycolicibacterium smegmatis MC2 155 for proteogenomic analysis. The mirror proteases accurately identified 368 novel peptides, exhibiting 75-80% b and y ion coverages against 65-68% y or b ion coverages of Ac-Trypsin (38.9% b and 68.3% y) or Ac-LysargiNase (65.5% b and 39.6% y) as annotated peptides from M. smegmatis MC2 155. The complementary b and y ion series largely increased the reliability of overlapped sequences derived from novel peptides. Among these novel peptides, 311 peptides were annotated in other public M. smegmatis strains, and 57 novel peptides with more continuous b and y pairs were obtained for further analysis after spectral quality assessment. This enabled mirror proteases to successfully correct six annotated proteins' N-termini and detect 17 new coding open reading frames (ORFs). We believe that mirror proteases will be an effective strategy for novel peptide detection in both prokaryotic and eukaryotic proteogenomics.

Keywords: Ac-LysargiNase; Ac-Trypsin; Mycolicibacterium smegmatis; mirror; proteogenomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Workflow of this study. Our in-house mirror proteases, Ac-Trypsin and Ac-LysargiNase, were used to assess the authenticity of novel peptides in a large-scale proteogenomic study based on 14N- and 15N-labeling cells of M. smegmatis MC2 155. The verified novel peptides were further used for novel event analysis, including N-termini corrections and novel ORF identifications.
Figure 2
Figure 2
Mirror proteases helpfully improved the proteome coverage of M. smegmatis MC2 155 based on the annotated database searching. (A) The application of Ac-Trypsin and Ac-LysargiNase in proteomics of M. smegmatis MC2 155. Comparison of the identified peptides (B) and proteins (C) by Ac-Trypsin and Ac-LysargiNase. (D) Protein identification saturation using gel-separation methods by Ac-Trypsin and Ac-LysargiNase. (E) Comparison of the protein sequence coverage by single protease and combined mirror proteases. (F) The proportion of the N-terminal and C-terminal spectra in each fraction from Ac-Trypsin and Ac-LysargiNase digests. (G) Comparison of the ion coverages from single protease and combined mirror proteases for annotated peptides. Statistically significant differences by student's t-test are indicated for p values of *p < 0.05, **p < 0.01, and ***p < 0.001.
Figure 3
Figure 3
Mirror proteases efficiently identified credible novel peptides. (A) The proteogenomic identification is based on the six-frame database of M. smegmatis MC2 155. (B) Venn diagram of the proteins identified from the Ac-Trypsin and Ac-LysargiNase datasets. (C) Comparison of the ion coverages from single protease and combined mirror proteases for novel peptides. Comparison of the Q-value (D) and raw score (E) spectra with single protease and mirror proteases digesting evidence from the Ac-Trypsin and Ac-LysargiNase datasets.
Figure 4
Figure 4
N-termini correction of 6 annotated proteins. (A) Novel peptides distribution in N-terminal extension regions. Peptide labeled star stands for 14N and 15N labeling spectra identification. The confirmed spectra of N-terminal peptides with 14N and 15N labeling forms were derived from the Ac-Trypsin (B) and Ac-LysargiNase (C) datasets, respectively. (D) The spectra of an N-termini labeled peptide with dimethyl modification from our N-terminomic dataset.
Figure 5
Figure 5
Verification of 17 novel ORFs. (A) novel peptide distribution in non-coding regions. Peptide labeled star stands for 14N and 15N labeling spectra identification. (B) FPKM rank of annotated and novel ORFs based on a public RNA-seq dataset. (C) Distribution of novel peptides digested by Ac-Trypsin and Ac-LysargiNase in orf|0|+|1060315-1061149|. The underlined sequences were identified as peptides derived from Ac-Trypsin and Ac-LysargiNase digestion in this study. The blue and orange arrows indicate the Ac-Trypsin and Ac-LysargiNase cleavage sites, respectively. The credible spectra of two different peptides, (R)DVAQVVGHQHGR and RDVAQVVGHQHG(R), from the Ac-Trypsin (D) and Ac-LysargiNase (E) dataset.

Similar articles

Cited by

References

    1. Aggarwal S., Raj A., Kumar D., Dash D., Yadav A. K. (2022). False discovery rate: the Achilles' heel of proteogenomics. Brief. Bioinform. 2022:bbac163. 10.1093/bib/bbac163 - DOI - PubMed
    1. Ang M. Y., Low T. Y., Lee P. Y., Wan Mohamad Nazarie W. F., Guryev V., Jamal R. (2019). Proteogenomics: from next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin. Chim. Acta 498, 38–46. 10.1016/j.cca.2019.08.010 - DOI - PubMed
    1. Cao L., Huang C., Cui Zhou D., Hu Y., Lih T. M., Savage S. R., et al. . (2021). Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell 184, 5031–5052 e5026. 10.1016/j.cell.2021.08.023 - DOI - PMC - PubMed
    1. Castellana N., Bafna V. (2010). Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73, 2124–2135. 10.1016/j.jprot.2010.06.007 - DOI - PMC - PubMed
    1. Chi H., Liu C., Yang H., Zeng W. F., Wu L., Zhou W. J., et al. . (2018). Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol. 2018, nbt.4236. 10.1038/nbt.4236 - DOI - PubMed