Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

doi:10.1093/bioinformatics/16.11.988

Comparative Study

. 2000 Nov;16(11):988-1002.

doi: 10.1093/bioinformatics/16.11.988.

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

A Wallqvist¹, Y Fukunishi, L R Murphy, A Fadel, R M Levy

Affiliations

PMID: 11159310
DOI: 10.1093/bioinformatics/16.11.988

Comparative Study

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

A Wallqvist et al. Bioinformatics. 2000 Nov.

. 2000 Nov;16(11):988-1002.

doi: 10.1093/bioinformatics/16.11.988.

Authors

A Wallqvist¹, Y Fukunishi, L R Murphy, A Fadel, R M Levy

Affiliation

¹ Department of Chemistry, Rutgers University, Wright-Rieman Laboratories, 610 Taylor Rd, Piscataway, NJ 08854-8087, USA. anders@rutchem.rutgers.edu

PMID: 11159310
DOI: 10.1093/bioinformatics/16.11.988

Abstract

Motivation: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate.

Results: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods.

Availability: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas

Contact: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu

Supplementary information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions.

PubMed Disclaimer

Cited by

Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
Kaushik S, Mutt E, Chellappan A, Sankaran S, Srinivasan N, Sowdhamini R. Kaushik S, et al. PLoS One. 2013;8(2):e56449. doi: 10.1371/journal.pone.0056449. Epub 2013 Feb 20. PLoS One. 2013. PMID: 23437136 Free PMC article.
Common Functions of Disordered Proteins across Evolutionary Distant Organisms.
Wallmann A, Kesten C. Wallmann A, et al. Int J Mol Sci. 2020 Mar 19;21(6):2105. doi: 10.3390/ijms21062105. Int J Mol Sci. 2020. PMID: 32204351 Free PMC article. Review.
A method for prediction of the locations of linker regions within large multifunctional proteins, and application to a type I polyketide synthase.
Udwary DW, Merski M, Townsend CA. Udwary DW, et al. J Mol Biol. 2002 Oct 25;323(3):585-98. doi: 10.1016/s0022-2836(02)00972-5. J Mol Biol. 2002. PMID: 12381311 Free PMC article.
Physiological Analysis and Genetic Mapping of Short Hypocotyl Trait in Brassica napus L.
Liu M, Hu F, Liu L, Lu X, Li R, Wang J, Wu J, Ma L, Pu Y, Fang Y, Yang G, Wang W, Sun W. Liu M, et al. Int J Mol Sci. 2023 Oct 21;24(20):15409. doi: 10.3390/ijms242015409. Int J Mol Sci. 2023. PMID: 37895090 Free PMC article.
Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences.
Polyanovsky VO, Roytberg MA, Tumanyan VG. Polyanovsky VO, et al. Algorithms Mol Biol. 2011 Oct 27;6(1):25. doi: 10.1186/1748-7188-6-25. Algorithms Mol Biol. 2011. PMID: 22032267 Free PMC article.

See all "Cited by" articles

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

GM-30580/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Silverchair Information Systems
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

Affiliation

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials