Aligning multiple genomic sequences with the threaded blockset aligner

doi:10.1101/gr.1933104

. 2004 Apr;14(4):708-15.

doi: 10.1101/gr.1933104.

Aligning multiple genomic sequences with the threaded blockset aligner

Mathieu Blanchette¹, W James Kent, Cathy Riemer, Laura Elnitski, Arian F A Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, David Haussler, Webb Miller

Affiliations

PMID: 15060014
PMCID: PMC383317
DOI: 10.1101/gr.1933104

Aligning multiple genomic sequences with the threaded blockset aligner

Mathieu Blanchette et al. Genome Res. 2004 Apr.

. 2004 Apr;14(4):708-15.

doi: 10.1101/gr.1933104.

Authors

Mathieu Blanchette¹, W James Kent, Cathy Riemer, Laura Elnitski, Arian F A Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, David Haussler, Webb Miller

Affiliation

¹ Howard Hughes Medical Institute, University of California at Santa Cruz, Santa Cruz, California 95064, USA.

PMID: 15060014
PMCID: PMC383317
DOI: 10.1101/gr.1933104

Abstract

We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.

PubMed Disclaimer

Figures

**Figure 1**
(A) Blocks (alignments) of a hypothetical threaded blockset for sequences h (400 bp), m (400 bp) and r (350 bp). Only the range of positions in each alignment is given. (B) Projection of the threaded blockset onto m.

**Figure 2**
(A) Alignments between the chloroplast genomes of *Arabidopsis thaliana* and *Oenothera elata* (evening primrose). Lines running from lower *left* to upper *right* indicate positions of matches on the forward strand (relative to the GenBank entries, NC_000932 and OEL271079, respectively), and lines running from upper *left* to lower *right* indicate matches in reverse complement. The alignments were computed and displayed by programs used by the PipMaker Web server (Schwartz et al. 2000). (B) Blocks of a threaded blockset for the chloroplast genomes of *Arabidopsis* and evening primrose.

**Figure 3**
A threaded blockset for vertebrate *HoxA* regions, displayed in our interactive blockset viewer Gmaj. (A) The red circle marks a position of interest where the tilapia reference sequence aligns with human. The block containing this position is highlighted in red in all of the alignment panels. Color underlays are blue for exons in the reference sequence and yellow for introns, and the exons are also represented as icons above the alignments. At the *top* of the Gmaj window, two status lines describe the positions of the mouse pointer and the red circle, respectively. Individual nucleotides for the selected block are displayed in the *bottom* pane, with the marked position highlighted. (B) The same region projected onto the human sequence. The underlays for human include (green) for EST evidence, (dark blue) for antisense RNA, and (red) for coding sequences. The conserved element from A is part of an alternative 5′-end identified by homology to a human EST from TIGR.

**Figure 4**
(A) Accuracy of the multiple alignments produced by different aligners on a set of nine simulated mammalian sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments. See the Methods section (Supplemental material) for an explanation of the R parameter. (B) Accuracy of the multiple alignments produced by different aligners on simulated human, mouse, and rat sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments.

**Figure 5**
Pictorial representation of an application of MULTIZ. M is a human-ref blockset of human, mouse, and rat, whereas N is a cow-ref blockset of cow and dog. MULTIZ uses a pairwise human-ref blockset, G, of human and cow to guide the aligning process. The output is a human-ref blockset of human, mouse, rat, cow, and dog. The reference sequence for each blockset is indicated by capital letters.

**Figure 6**
UCSC Genome Browser display of HUMOR alignments. (A) Ribosomal protein RPL31. The human/mouse/rat track shows the MULTIZ score normalized as described in the text. The high conservation of exons relative to introns is typical of many genes. (B) Transcription Factor FOS. In highly regulated genes such as this one, it is not unusual to find extensive conservation outside of protein-coding exons. (C) Closeup of a poorly conserved part of a RPL31 intron. When the display is zoomed in close enough, the base-by-base alignment is displayed as well as the score graph. Because the alignment is projected onto the reference sequence, a “Hidden Gaps” row indicates areas where in the full alignment there would be dashes in the reference sequence row. Clicking on the human/mouse/rat track takes you to a details page that displays the full alignment. (D) Closeup of an exon/intron boundary in FOS. The canonical “GT” 5′ consensus sequence is usually conserved, but then conservation falls off for the rest of the intron.

See this image and copyright information in PMC

Cited by

Discovery of novel microRNA mimic repressors of ribosome biogenesis.
Bryant CJ, McCool MA, Rosado González GT, Abriola L, Surovtseva YV, Baserga SJ. Bryant CJ, et al. Nucleic Acids Res. 2024 Feb 28;52(4):1988-2011. doi: 10.1093/nar/gkad1235. Nucleic Acids Res. 2024. PMID: 38197221 Free PMC article.
RhesusBase: a knowledgebase for the monkey research community.
Zhang SJ, Liu CJ, Shi M, Kong L, Chen JY, Zhou WZ, Zhu X, Yu P, Wang J, Yang X, Hou N, Ye Z, Zhang R, Xiao R, Zhang X, Li CY. Zhang SJ, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D892-905. doi: 10.1093/nar/gks835. Epub 2012 Sep 10. Nucleic Acids Res. 2013. PMID: 22965133 Free PMC article.
Overcoming NS1-mediated immune antagonism involves both interferon-dependent and independent mechanisms.
Thakar J, Schmid S, Duke JL, García-Sastre A, Kleinstein SH. Thakar J, et al. J Interferon Cytokine Res. 2013 Nov;33(11):700-8. doi: 10.1089/jir.2012.0113. Epub 2013 Jun 17. J Interferon Cytokine Res. 2013. PMID: 23772952 Free PMC article.
YOC, A new strategy for pairwise alignment of collinear genomes.
Uricaru R, Michotey C, Chiapello H, Rivals E. Uricaru R, et al. BMC Bioinformatics. 2015 Apr 2;16(1):111. doi: 10.1186/s12859-015-0530-3. BMC Bioinformatics. 2015. PMID: 25885358 Free PMC article.
Elephant Genomes Reveal Accelerated Evolution in Mechanisms Underlying Disease Defenses.
Tollis M, Ferris E, Campbell MS, Harris VK, Rupp SM, Harrison TM, Kiso WK, Schmitt DL, Garner MM, Aktipis CA, Maley CC, Boddy AM, Yandell M, Gregg C, Schiffman JD, Abegglen LM. Tollis M, et al. Mol Biol Evol. 2021 Aug 23;38(9):3606-3620. doi: 10.1093/molbev/msab127. Mol Biol Evol. 2021. PMID: 33944920 Free PMC article.

See all "Cited by" articles

References

1. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
1. Bray, N. and Pachter, L. 2003. MAVID multiple alignment server. Nucleic Acids Res. 31: 3525-3526. - PMC - PubMed
1. Brudno, M. and Morgenstern, B. 2002. Fast and sensitive alignment of large genomic sequences. In Proceedings of the IEEE Computer Society Bioinformatics Conference, pp. 138-150. IEEE Press. - PubMed
1. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S., and NISC Comparative Sequencing Program. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731. - PMC - PubMed
1. Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed

WEB SITE REFERENCES

1. http://bio.cse.psu.edu/; TBA, simulated test data, and the Gmaj visualization tool.
1. http://genome.ucsc.edu; MULTIZ and HUMOR alignments.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 HG002238/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed

[2] Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed

[3] Bray, N. and Pachter, L. 2003. MAVID multiple alignment server. Nucleic Acids Res. 31: 3525-3526. - PMC - PubMed

[4] Bray, N. and Pachter, L. 2003. MAVID multiple alignment server. Nucleic Acids Res. 31: 3525-3526. - PMC - PubMed

[5] Brudno, M. and Morgenstern, B. 2002. Fast and sensitive alignment of large genomic sequences. In Proceedings of the IEEE Computer Society Bioinformatics Conference, pp. 138-150. IEEE Press. - PubMed

[6] Brudno, M. and Morgenstern, B. 2002. Fast and sensitive alignment of large genomic sequences. In Proceedings of the IEEE Computer Society Bioinformatics Conference, pp. 138-150. IEEE Press. - PubMed

[7] Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S., and NISC Comparative Sequencing Program. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731. - PMC - PubMed

[8] Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S., and NISC Comparative Sequencing Program. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731. - PMC - PubMed

[9] Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed

[10] Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Aligning multiple genomic sequences with the threaded blockset aligner

Affiliation

Aligning multiple genomic sequences with the threaded blockset aligner

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

WEB SITE REFERENCES

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

WEB SITE REFERENCES

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources