Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 5;15(8):2422-32.
doi: 10.1021/acs.jproteome.5b01098. Epub 2016 Jul 1.

Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore

Affiliations

Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore

Qiang Kou et al. J Proteome Res. .

Abstract

Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.

Keywords: post-translational modification; proteoform.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the conversion from a deconvoluted spectrum of neutral masses to a binary string. A spectrum (top) has three neutral fragment masses 2.2, 3.9, and 8.1 Da (peak intensities are ignored), and its precursor mass is 10.1 Da. The precursor and fragment masses are discretized by multiplying by a scale factor 1 and rounding to integers, resulting in a spectrum with a precursor mass 10 and three fragment masses 2, 4 and 8. The discretized spectrum is converted to a binary string 0101000100. The length of the string is the same to the integer precursor mass; the three 1s correspond to the three fragment masses.
Figure 2
Figure 2
The three-dimensional table D(f, g, h) for a discretized spectrum with a precursor mass 848 and four neutral fragment masses 131, 413, 421, 550, a protein sequence MSDYCH, and an ordered pair of modifications (phosphorylation, methylation). A scale factor 1 is used in the computation. (a) B0,g is the sum of the masses of the first g residues of the protein. B1,g is the sum of B0,g and the mass of phosphorylation (80 Da). B2,g is the sum of B0,g and the masses of phosphorylation (80 Da) and methylation (14 Da). (b) Table sf,g is generated based on Bf,g using Equation (3). (c) D(f, g, h) is filled out by the dynamic programming algorithm in Figure S1 in the supplementary material. The shaded areas are initialized using Equation (4). The second residue S is a modification site of phosphorylation, and the value D(1, 2, 2) is computed as D(0, 1, 2−s1,2+D(1, 1, 2−s1,2) = D(0, 1, 1)+D(1, 1, 1). Similarly, the fifth residue C is modification site of methylation, and the value D(2, 5, 3) is computed as D(1, 4, 3 − s2,5) + D(2, 4, 3 − s2,5) = D(1, 4, 3) + D(2, 4, 3).
Figure 3
Figure 3
The modification sites reported by the MIScore method from the 6 100 PrSMs with one modification are grouped into bins with width 0.1 based on their MIScores. The average identification score and accuracy rate of the modification sites in each bin are compared.

Similar articles

Cited by

References

    1. Smith LM, Kelleher NL Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nature Methods. 2013;10:186–187. - PMC - PubMed
    1. Bairoch A, Boeckmann B, Ferro S, Gasteiger E. Swiss-Prot: juggling between evolution and stability. Briefings in Bioinformatics. 2004;5:39–55. - PubMed
    1. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research. 2007;35:D61–D65. - PMC - PubMed
    1. Dong X, Sumandea CA, Chen YC, Garcia-Cazarin ML, Zhang J, Balke CW, Sumandea MP, Ge Y. Augmented phosphorylation of cardiac troponin I in hypertensive heart failure. Journal of Biological Chemistry. 2012;287:848–857. - PMC - PubMed
    1. Peleg S, et al. Altered histone acetylation is associated with age-dependent memory impairment in mice. Science. 2010;328:753–756. - PubMed

LinkOut - more resources