Retrieval and on-the-fly alignment of sequence fragments from the HIV database
- PMID: 11331235
- DOI: 10.1093/bioinformatics/17.5.415
Retrieval and on-the-fly alignment of sequence fragments from the HIV database
Abstract
Motivation: The amount of HIV-1 sequence data generated (presently around 42000 sequences, of which more than 22000 are from the V3 region of the viral envelope) presents a challenge for anyone working on the analysis of these data. A major problem is obtaining the region of interest from the stored sequences, which often contain but are not limited to that region. In addition, multiple alignment programs generally cannot deal with the large numbers of sequences that are available for many HIV-1 regions. We set out to provide our users with a tool that will retrieve and create an initial alignment of the HIV sequences that are available for a given genomic region.
Results: The MPAlign (Multiple Pairwise Alignment) web interface is a collection of Perl scripts that retrieves sequences from the Los Alamos HIV sequence database based on a number of search parameters. All sequences were pairwise-aligned to a model sequence using the Hidden Markov Model-based program HMMER. The HMMER model is general enough to accommodate virtually all HIV-1 sequences stored in the database. To create a multiple sequence alignment, gaps were inserted into the sequences during retrieval, so that they are aligned to one another. Retrieving and aligning the almost 560 gp120 sequences (approximately>1500 nt) stored in the database is at least 1500 times faster than a similar Clustal alignment.
Similar articles
-
Using CLUSTAL for multiple sequence alignments.Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8. Methods Enzymol. 1996. PMID: 8743695
-
Clustal Omega, accurate alignment of very large numbers of sequences.Methods Mol Biol. 2014;1079:105-16. doi: 10.1007/978-1-62703-646-7_6. Methods Mol Biol. 2014. PMID: 24170397
-
The HMMER Web Server for Protein Sequence Similarity Search.Curr Protoc Bioinformatics. 2017 Dec 8;60:3.15.1-3.15.23. doi: 10.1002/cpbi.40. Curr Protoc Bioinformatics. 2017. PMID: 29220076
-
A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes.BMC Bioinformatics. 2006 May 22;7:265. doi: 10.1186/1471-2105-7-265. BMC Bioinformatics. 2006. PMID: 16716226 Free PMC article.
-
Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.PLoS Comput Biol. 2016 May 18;12(5):e1004936. doi: 10.1371/journal.pcbi.1004936. eCollection 2016 May. PLoS Comput Biol. 2016. PMID: 27192614 Free PMC article.
Cited by
-
Interplay between HIV-1 and Host Genetic Variation: A Snapshot into Its Impact on AIDS and Therapy Response.Adv Virol. 2012;2012:508967. doi: 10.1155/2012/508967. Epub 2012 May 16. Adv Virol. 2012. PMID: 22666249 Free PMC article.
-
Identification of regions in multiple sequence alignments thermodynamically suitable for targeting by consensus oligonucleotides: application to HIV genome.BMC Bioinformatics. 2004 Apr 29;5:44. doi: 10.1186/1471-2105-5-44. BMC Bioinformatics. 2004. PMID: 15115544 Free PMC article.
-
The Number and Complexity of Pure and Recombinant HIV-1 Strains Observed within Incident Infections during the HIV and Malaria Cohort Study Conducted in Kericho, Kenya, from 2003 to 2006.PLoS One. 2015 Aug 19;10(8):e0135124. doi: 10.1371/journal.pone.0135124. eCollection 2015. PLoS One. 2015. PMID: 26287814 Free PMC article.
-
Conserved molecular signatures in gp120 are associated with the genetic bottleneck during simian immunodeficiency virus (SIV), SIV-human immunodeficiency virus (SHIV), and HIV type 1 (HIV-1) transmission.J Virol. 2015 Apr;89(7):3619-29. doi: 10.1128/JVI.03235-14. Epub 2015 Jan 14. J Virol. 2015. PMID: 25589663 Free PMC article.
-
MPL resolves genetic linkage in fitness inference from complex evolutionary histories.Nat Biotechnol. 2021 Apr;39(4):472-479. doi: 10.1038/s41587-020-0737-3. Epub 2020 Nov 30. Nat Biotechnol. 2021. PMID: 33257862 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials