MultiSeq: unifying sequence and structure data for evolutionary analysis

doi:10.1186/1471-2105-7-382

. 2006 Aug 16:7:382.

doi: 10.1186/1471-2105-7-382.

MultiSeq: unifying sequence and structure data for evolutionary analysis

Elijah Roberts¹, John Eargle, Dan Wright, Zaida Luthey-Schulten

Affiliations

PMID: 16914055
PMCID: PMC1586216
DOI: 10.1186/1471-2105-7-382

MultiSeq: unifying sequence and structure data for evolutionary analysis

Elijah Roberts et al. BMC Bioinformatics. 2006.

. 2006 Aug 16:7:382.

doi: 10.1186/1471-2105-7-382.

Authors

Elijah Roberts¹, John Eargle, Dan Wright, Zaida Luthey-Schulten

Affiliation

¹ Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA. erobert3@scs.uiuc.edu

PMID: 16914055
PMCID: PMC1586216
DOI: 10.1186/1471-2105-7-382

Abstract

Background: Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes.

Results: Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins.

Conclusion: MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: http://www.scs.uiuc.edu/~schulten/multiseq/

PubMed Disclaimer

Figures

**Figure 1**
**MultiSeq Overview**. Overview of the MultiSeq environment showing aligned sequence and structural data. (1) 1D representation of structural data colored by structural conservation. (2) 1D representation of sequence data colored by sequence identity. (3) 3D representation of structural data colored by structural conservation, as shown by VMD. For structural data, the coloring is synchronized between the 1D representation and the 3D representation.

**Figure 2**
**BLAST Results Viewer**. BLAST search results viewer showing the outcome of a BLAST search. (1) The name of the matching sequence is shown along with (2) the expectation value of the match. (3) The BLAST aligned regions are shown as an MSA; non-matched regions on either side of the aligned region are shown grayed out. (4) The search results can be filtered by BLAST e-score, taxonomy, or sequence QR based redundancy.

**Figure 3**
**Grouping**. [A] Grouping in MultiSeq. (1) Group headers show the name of the group and allow the user to manage the group. (2) The status bar shows summary information about the group. [B] MultiSeq allows data to be automatically grouped by taxonomic classification. (3) The taxonomy dialog allows the user to select the level of taxonomy by which to group the data. (4) Taxonomic information about the data is then used to create the groupings.

**Figure 4**
**MultiSeq Tools**. (1) The electronic notebook displays various metadata associated with the sequence and also provides space for making annotations about a sequence. Changes will be saved in the MultiSeq session. (2) The phylogenetic tree viewer shows evolutionary relationships amongst the data. Data are labeled by species name and colored by domain of life, those highlighted in yellow are part of the selected non-redundant set. (3) The QR ordering of the non-redundant set is also displayed, lower numbers indicate data that are more linearly independent. (4) The plotter allows a metric to be plotted along the length (or a subset) of the sequence. All of the coloring metrics can also be used by the plotter.

See this image and copyright information in PMC

Cited by

Characterization of Danio rerio Mn2+-dependent ADP-ribose/CDP-alcohol diphosphatase, the structural prototype of the ADPRibase-Mn-like protein family.
Rodrigues JR, Fernández A, Canales J, Cabezas A, Ribeiro JM, Costas MJ, Cameselle JC. Rodrigues JR, et al. PLoS One. 2012;7(7):e42249. doi: 10.1371/journal.pone.0042249. Epub 2012 Jul 27. PLoS One. 2012. PMID: 22848751 Free PMC article.
Catalytic transitions in the human MDR1 P-glycoprotein drug binding sites.
Wise JG. Wise JG. Biochemistry. 2012 Jun 26;51(25):5125-41. doi: 10.1021/bi300299z. Epub 2012 Jun 12. Biochemistry. 2012. PMID: 22647192 Free PMC article.
Emergence of the universal genetic code imprinted in an RNA record.
Hohn MJ, Park HS, O'Donoghue P, Schnitzbauer M, Söll D. Hohn MJ, et al. Proc Natl Acad Sci U S A. 2006 Nov 28;103(48):18095-100. doi: 10.1073/pnas.0608762103. Epub 2006 Nov 16. Proc Natl Acad Sci U S A. 2006. PMID: 17110438 Free PMC article.
Structure of an archaeal non-discriminating glutamyl-tRNA synthetase: a missing link in the evolution of Gln-tRNAGln formation.
Nureki O, O'Donoghue P, Watanabe N, Ohmori A, Oshikane H, Araiso Y, Sheppard K, Söll D, Ishitani R. Nureki O, et al. Nucleic Acids Res. 2010 Nov;38(20):7286-97. doi: 10.1093/nar/gkq605. Epub 2010 Jul 3. Nucleic Acids Res. 2010. PMID: 20601684 Free PMC article.
A structural analysis of the AAA+ domains in Saccharomyces cerevisiae cytoplasmic dynein.
Gleave ES, Schmidt H, Carter AP. Gleave ES, et al. J Struct Biol. 2014 Jun;186(3):367-75. doi: 10.1016/j.jsb.2014.03.019. Epub 2014 Mar 28. J Struct Biol. 2014. PMID: 24680784 Free PMC article.

See all "Cited by" articles

References

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
1. Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:226–229. doi: 10.1093/nar/gkh039. - DOI - PMC - PubMed
1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol . 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. - DOI - PubMed
1. Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004;32:189–192. doi: 10.1093/nar/gkh034. - DOI - PMC - PubMed
1. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH – a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/S0969-2126(97)00260-8. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed

[2] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed

[3] Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:226–229. doi: 10.1093/nar/gkh039. - DOI - PMC - PubMed

[4] Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:226–229. doi: 10.1093/nar/gkh039. - DOI - PMC - PubMed

[5] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol . 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. - DOI - PubMed

[6] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol . 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. - DOI - PubMed

[7] Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004;32:189–192. doi: 10.1093/nar/gkh034. - DOI - PMC - PubMed

[8] Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004;32:189–192. doi: 10.1093/nar/gkh034. - DOI - PMC - PubMed

[9] Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH – a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/S0969-2126(97)00260-8. - DOI - PubMed

[10] Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH – a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/S0969-2126(97)00260-8. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MultiSeq: unifying sequence and structure data for evolutionary analysis

Affiliation

MultiSeq: unifying sequence and structure data for evolutionary analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources