Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Jan;15(1):184-94.
doi: 10.1101/gr.3007205. Epub 2004 Dec 8.

Mulan: multiple-sequence local alignment and visualization for studying function and evolution

Affiliations
Comparative Study

Mulan: multiple-sequence local alignment and visualization for studying function and evolution

Ivan Ovcharenko et al. Genome Res. 2005 Jan.

Abstract

Multiple-sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes, and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short- and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multispecies comparisons of the GATA3 gene locus and the identification of elements that are conserved in a different way in avians than in other genomes, allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://www.bx.psu.edu/miller_lab/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Mulan contig ordering based on homology to the reference sequence. The top layer of shaded lines indicates the location of contigs from a second sequence aligned to the base sequence, where red triangles pointing to the right specify forward-strand alignments, and purple triangles pointing to the left correspond to reverse-strand alignments. Contig names are indicated in square brackets. The JF2-73M16 chicken BAC clone (http://www.jgi.doe.gov/) consisting of three contigs was aligned to the chicken genome (chr28:4,000,000–4,200,000).
Figure 2.
Figure 2.
Stacked-pairwise conservation profile for a 13-kb region from the GATA3 locus. Color-gradient visualization is implemented to differentially display regions that are differentially conserved in the input sequences (A). The color intensity of a conserved region depends on the number of different species that contain the region (the darker, the more conserved species). Only ECRs conserved in at least six out of seven total secondary species are highlighted in the alignment (B). Intergenic regions are in red, intronic in pink, coding exons in blue, and UTRs in yellow.
Figure 3.
Figure 3.
Phylogenetic shadowing option of the Mulan tool. ApoB region sequences from 14 primates were compared to determine the phylogenetic relationship (A) and visualize the conservation by stacked pairwise display (B). The phylogenetic shadowing conservation profile preferentially detects the ApoB coding exon from the neutrally evolving background (C). ECR parameters used for detecting exons: >85% identity; >100 bp.
Figure 4.
Figure 4.
Agreement score of pairwise and multiple alignments produced by different aligners on a set of nine simulated mammalian sequences of length ∼50 kb. Pairwise results from BLASTZ were postprocessed to remove overlapped regions. Multiple aligners including refine and TBA use the same pairwise alignments. TBA_5 refers to alignments from TBA, but the agreement score allows mismatches within five base positions. Agreement scores of multiple aligners are measured from the pairwise alignments induced by pairs of species. All values are averaged over 50 sets of simulation sequences. Parameters used in the simulation and alignment programs are described in the text.
Figure 5.
Figure 5.
Mulan phylogenetic tree (A) and sequence conservation profile (B) for the GATA3 gene locus from human, rat, mouse, chicken, frog, and three fish genomes. Each tree branch indicates the number of nucleotide substitutions from the closest node. Noncoding ECRs conserved (>70% identity; >80 bp) in at least four species (including human) are shaded and numbered ECR1–5. Coding exons are in blue, UTRs in yellow, intergenic elements in red, and intronic in pink. ECRs are depicted as dark red bars above each pairwise alignment. Repetitive elements are depicted as green boxes on the bottom axis. Alignments resulting from the reverse strand are shaded in gray, and blocks on the forward and reverse strands can be visualized in a dot-plot between the zebrafish and the human local alignment (C).
Figure 6.
Figure 6.
multiTF visualization of CRE-BP1 transcription factor binding site detected in the GATA3 locus overlaid with the conservation profile of this locus as constructed with human, mouse, rat, frog, Fugu, tetraodon, and zebrafish sequences. The bottom panel represents a 60-bp-long alignment for the ECR3 core region that contains the CRE-BP1 binding site (blue) shared by all the species.
Figure 7.
Figure 7.
Conservation of ZFPM1 among human, mouse, rat, and mouse, using TBA at the Mulan server. The large introns have several highly conserved regions. Those with conserved GATA-1 binding sites and high regulatory potential (predicted CRMs) are indicated by a set of purple and red blocks under the gene demarcated by star symbols. Red color of two block elements means that they are positive for binding GATA-1 in erythroid cells, as assayed by chromatin immunoprecipitation (Welch et al. 2004).
Figure 8.
Figure 8.
Schematic visualization of the multiTF method of identifying TFBSs shared by multiple species. Blue font color indicates a TFBS with the consensus sequence of [t/g/a]GG[g/a]CTGT[g/c] that would be detected by multiTF. Light-red shading highlights one of the anchor nucleotides for this binding site detection.

Similar articles

Cited by

References

    1. Aerts, S., Thijs, G., Coessens, B., Staes, M., Moreau, Y., and De Moor, B. 2003. Toucan: Deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31: 1753-1764. - PMC - PubMed
    1. Andl, T., Ahn, K., Kairo, A., Chu, E.Y., Wine-Lee, L., Reddy, S.T., Croft, N.J., Cebra-Thomas, J.A., Metzger, D., Chambon, P., et al. 2004. Epithelial Bmpr1a regulates differentiation and proliferation in postnatal hair follicles and is essential for tooth development. Development 131: 2257-2268. - PubMed
    1. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708-715. - PMC - PubMed
    1. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394. - PubMed
    1. Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global alignment program. Genome Res. 13: 97-102. - PMC - PubMed

Web site references

    1. http://www.bx.psu.edu/miller_lab/; Source code for the aligners and the aligner-evaluation software.
    1. http://globin.cse.psu.edu/gala/; GALA.
    1. http://mulan.dcode.org; Mulan.
    1. http://rvista.dcode.org/; rVista 2.0.
    1. http://www.jgi.doe.gov/; Joint Genome Institute sequencing facility.

Publication types

MeSH terms

LinkOut - more resources