Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 5:12:275.
doi: 10.1186/1471-2105-12-275.

AlignHUSH: alignment of HMMs using structure and hydrophobicity information

Affiliations

AlignHUSH: alignment of HMMs using structure and hydrophobicity information

Oruganty Krishnadev et al. BMC Bioinformatics. .

Abstract

Background: Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection.

Results: We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments.

Conclusions: A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of performance of AlignHUSH method to HHSearch and PRC. A) The sensitivity and error rate values for both AlignHUSH and HHSearch are plotted in this figure. The sensitivity of AlignHUSH is better than HHSearch or PRC at almost all error rates. The 'no_sec', 'no_hyd' and 'no_neigh' are variants of AlignHUSH procedure without use of secondary structure, hydrophobic and neighboring column information respectively. B) Alignment accuracy of the three methods that have been examined in detail in the main text. The alignment accuracy given in this plot corresponds to the 'developer score' defined in the main text. The three methods are comparable as far as the accuracy using developer score is concerned. C) The alignment accuracy of the three methods using the 'modeller score' defined in the main text. The performance of AlignHUSH is slightly better than that of HHSearch and PRC. HHSearch generated alignments tend to be very short and hence HHSearch has a low value for 'modeller score' alignment accuracy. D) The length of the query HMM covered by the alignment is plotted for the alignment between homologous families (two SCOP families belonging to the same SCOP superfamily). The coverage of query HMM is greater in case of AlignHUSH than HHSearch which indicates that AlignHUSH generated alignments are more informative for function annotation, since they cover almost the entire homologous region. The alignment length coverage is very similar between the PRC generated alignments and AlignHUSH generated alignments.
Figure 2
Figure 2
Figure showing the overlap between the related SCOP families found as hits with E-values better than 10 for the three methods studied in this paper. The numbers given along with the name of the method are the true relationships found uniquely using the method. The numbers given in the overlap regions are the relationships found using one or more of the methods. Figure generated using the web-tool from http://www.cs.kent.ac.uk/people/staff/pjr/EulerVennCircles/EulerVennApplet.html.
Figure 3
Figure 3
Examples of two pairs of proteins with structural similarity between profiles that can be considered as false positives according to SCOP definition. A) two proteins belonging to two different SCOP folds. The similarity in structure is evident from the figure and is also noted in the SCOP database. The structure on the right is 1KU9, N terminal part and the structure on left is 1BM8 (winged helix domain). The inset shows the full length proteins and the foreground picture shows the superposition of the part of each protein suggested to be homologous by AlignHUSH. B) two proteins belonging to different SCOP folds in the same class. Visual inspection does not seem to bring out the similarity between the two proteins and perhaps this is the reason why they are classified into two different folds by SCOP. The DALI Z score between the two proteins is around 8.0 covering 150 residues with an RMSD of 3.3 Å.
Figure 4
Figure 4
Assessment of function annotation transfer between the DUF925 family and the Nucleotidyl transferase family (d.218.1.4). a) The structural alignment between two proteins in the SCOP superfamily of Nucleotidyltransferase, 1miv (shown in blue) and 7icq (shown in green). The active site residues in 1miv are shown in red, and the active site residues of 7icq are shown in blue, both in the stick format. The nucleotide binding residues in 1miv (Asp40 and Asp42) are seen to be aligned with the nucleotide binding residues of 7icq (Asp190 and Asp192). The active site is mostly conserved, but there are some differences between the two proteins which could perhaps explain the different substrates and mechanism employed by the two proteins. b) The alignment between the DUF925 family proteins and the SCOP family d.218.1.4 proteins. The alignment shows that the Aspartate residues important for binding nucleotides are conserved across the two families, but the conservation of other active site residues is not observed. Alignment figure generated using Jalview [39] and structure figure generated using PyMOL [40].

Similar articles

Cited by

References

    1. Pei J. Multiple protein sequence alignment. Curr Opin Struct Biol. 2008;18:382–386. doi: 10.1016/j.sbi.2008.03.007. - DOI - PubMed
    1. Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol. 2005;15:285–289. doi: 10.1016/j.sbi.2005.05.011. - DOI - PubMed
    1. Bhadra R, Srinivasan N, Pandit SB. A new domain family in the superfamily of alkaline phosphatases. In Silico Biol. 2005;5:379–387. - PubMed
    1. Kuzniar A, van Ham RC, Pongor S, Leunissen JA. The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 2008;24:539–551. doi: 10.1016/j.tig.2008.08.009. - DOI - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

Publication types