Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;83(3):411-27.
doi: 10.1002/prot.24746. Epub 2015 Jan 13.

Refinement by shifting secondary structure elements improves sequence alignments

Affiliations

Refinement by shifting secondary structure elements improves sequence alignments

Jing Tong et al. Proteins. 2015 Mar.

Abstract

Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa.

Keywords: alignment improvement; alignment refinement; contact energy; local secondary structure shifting; pairwise alignment.

PubMed Disclaimer

Figures

Figure 1
Figure 1. An overview of the SFESA method
(A) For each alignment block, SFESA generates up to ±4 variants by shifting (marked as -1, -2, -3, -4, +1, +2, +3, and +4). The pink boxes show the SSEs recognized from template structure and the blue boxes are corresponding regions in the query aligned to such SSEs. Residues and gaps in one corresponding blue and pink boxes compose an alignment block. The corresponding black lines provide the boundaries between which sequence and structure scores are calculated for each aligned residue pairs. (B) If gap shifting is considered, two variants (left and right) are generated by putting gaps on the same side (left or right) before generating the above 8 variants. (C). Flowchart of the SFESA method.
Figure 2
Figure 2. The template contact residue pairs are transferred to the query by original alignment to calculate structure score for the original alignment block and alignment variants
The blue and red filled circles represent residues in query and template, respectively. The dashed lines connect aligned residue pairs in the original alignment. Residue i is in contact with residues j1, j2, j3jn based on template structure. Residue I in the query is aligned with i and is inferred to be in contact with residues k1, k2, k3… kn that are aligned to j1, j2, j3jn. The contact-based score for residue l is calculated by Eq (5). In the case of +1 shift, residue l-1 is aligned to residue i, and the inferred contacts are between residue l-1 and k1, k2, k3… kn (shown as dashed lines).
Figure 3
Figure 3. Tests on our training subsets divided by four SCOP classes
DALI Q-score is compared in different subsets: 275 class a alignments (all α proteins), 352 class b alignments (all β proteins), 455 class c alignments (α and β proteins (α/β)) and 515 class d alignments (α and β proteins (α+β)). The blue column represents the performance of PROMALS alignments. The red column shows the SFESA (O+G+M) results with parameters derived from all data (1675 alignments). The green and purple columns are the SFESA (O+G+M) results trained on class b and class c, respectively. The error bars (standard error of the mean) are showed.
Figure 4
Figure 4. Alignment block-level evaluation of SFESA performance on different datasets
(A) Evaluation on our training dataset (1675 alignments). (B) Evaluation on the MUSTER benchmark (300 alignments). (C) Evaluation on the SALIGN benchmark (200 alignments). SFESA (O+G+M+S) is used to refine alignments generated by PROMALS and Dali structure alignment is used as the reference. The blue column represents the number of alignments in which a certain number of aligned blocks were improved by SFESA. The red column represents the number of alignments in which a certain number of aligned blocks were deteriorated by SFESA. Columns of the “0” in the x-axis show the number of alignments where none of the alignment blocks were improved (blue) by SFESA and the number of alignments where none of the alignment blocks were deteriorated (red) by SFESA. The number of alignment cases in each category and the percentage is shown above each column.
Figure 5
Figure 5. DALI Q-score for the MUSTER benchmark
(A) Scatter plot of SFESA (O+G+M+S) Q-score (applied to PROMALS) vs. PROMALS Q-score. Each point represents one domain pair. (B) The number of alignments that SFESA is better than PROMALS in Q-score and the number of alignments that PROMALS is better than SFESA at different Q-score difference cutoffs. (C) Scatter plot of SFESA (O+G+M+S) Q-score (applied to CNFpred) vs. CNFpred Q-score. (D) The number of the alignments that SFESA is better than CNFpred in Qscore and the number of the alignments that CNFpred is better than SFESA at different Q-score difference cutoffs.
Figure 6
Figure 6. Three examples of SFESA refinement
(A) The alignments between d2ffsa1 (query) and d2qpva1 (template) generated by PROMALS and SFESA (O+G) + PROMALS. (B) The partial alignments between d1c7qa_ (query) and d1iata_ (template) generated by PROMALS and SFESA (O) + PROMALS. (C) The alignments between d1j8yf1 (query) and d1vmaa1 (template) generated by PROMALS and SFESA (O) + PROMALS. The pink boxes show the SSEs recognized from template and the blue boxes are those regions in the query aligned to such SSEs. Each corresponding blue and pink regions is an alignment block. The asterisk between two aligned residues indicates this aligned residue pair is in agreement with DALI alignment (reference).
Figure 7
Figure 7. An example of SFESA correction of a misaligned active site residue
(A). Superposition of d1h97a_ (query) and d1tu9a_ (template) based on the DALI structure alignment (reference). The blue (query) and pink (template) α-helical regions indicate the alignment block. The histidine residues are the active site residues in contact with hemes (shown in lines). LYS96 and HIS98 in the query are incorrectly aligned to HIS76 and ARG78in the template in the PROMALS alignment, respectively. The sidechains of these residues are shown in sticks. (B). Alignments of DALI (reference), PROMALS and SFESA in this region. All SFESA modes can generate such alignment refinement.

Similar articles

Cited by

References

    1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294(5540):93–96. - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. - PMC - PubMed
    1. Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008;18(3):342–348. - PMC - PubMed
    1. Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003;31(13):3381–3385. - PMC - PubMed
    1. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics. 2006;Chapter 5(Unit 5):6. - PMC - PubMed

Publication types

LinkOut - more resources