Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Dec;17(12):1797-808.
doi: 10.1101/gr.6761107. Epub 2007 Nov 5.

28-way vertebrate alignment and conservation track in the UCSC Genome Browser

Affiliations
Comparative Study

28-way vertebrate alignment and conservation track in the UCSC Genome Browser

Webb Miller et al. Genome Res. 2007 Dec.

Abstract

This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A tree indicating assumed evolutionary relationships among the sequences in the 28-way alignment. Branch lengths are proportional to average number of substitutions per site. Species not previously available in our whole-genome alignments are indicated with an asterisk. Filled circles indicate named clades, such as amniotes and eutherians, that are mentioned in the text. A tree labeled with branch lengths is given in the Supplemental material.
Figure 2.
Figure 2.
A 6-bp deletion near the start of PRNP. Species in the 28-way alignment lacking data for this region are not shown. This alignment can be seen in the current (hg18) UCSC Genome Browser at chr20:4,627,867–4,627,880. See Supplemental Figure S2 for an amino acid alignment of the deletion involving many more species.
Figure 3.
Figure 3.
Number of inferred coding indels and number per million years on the placental branches leading to human. Estimated time elapsed on each branch is taken from Murphy et al. (2007).
Figure 4.
Figure 4.
Extreme conservation of the region around the 3-bp insertion in human SULF1. The symbol “-” indicates that there is no base in the aligning species that aligns to this location. Placement of the gap at the first human E results from tie-breaking rules in the alignment software. The second gap in the lizard sequence was positioned using nucleotide content of codons. hg18.chr8:70,698,769–70,698,840.
Figure 5.
Figure 5.
The 6-bp insertion in the human GFM2 gene, showing the location of a 6-bp interval that is absent in some people. The symbol “-” indicates that there is no base in the aligning species that aligns to this location, and “=” indicates that at this location in the aligning species there is a sequence of bases of such different length and/or sequence composition that it cannot be reliably aligned. Sequence for this interval is currently not available for shrew or tenrec. chr5:74,057,590–74,057,630.
Figure 6.
Figure 6.
A segment of the gene for PAH, showing positions of some deletions that may be associated with the disease PKU. The CTT deletion (shown in reverse orientation) removes an amino acid whose column has six distinct letters and in that sense is not well conserved. The nucleotide symbol “N” represents an unsequenced base, and “=” indicates that at this location in the aligning species there is a sequence of bases of such different length and/or sequence composition that it cannot be reliably aligned. The two conservation tracks indicate that the deleted position is not well conserved among all vertebrates, but is fairly well conserved within mammals. chr12:101,761,637–101,761,687.
Figure 7.
Figure 7.
Phylogenetic extent of the alignment of functional features. (A) The distributions of alignment scores per column for the subset of intervals in each feature set (coding exons, UCEs, putative transcriptional regulatory regions, and PRPs) and the background human genome (nonrepetitive, noncoding) that align with each comparison species. For these box plots, the center line of each box is the median, the box extends from the 25th to 75th percentiles, and the feathers extend to 1.5 times the interquartile distance. The boxes are colored by feature set according to the legend along the top. (B) Barplots showing the fraction of intervals with >50% alignability for each feature set and for the background. (C) Decay of mean alignability as a function of phylogenetic distance. The mean alignabilities of the background human genome and intervals in each feature set are plotted against the distance from human to each comparison species. The distance is measured as the total substitutions per 4D site on each of the branches connecting human to the comparison species. The common name for each comparison species is given below the barplots in B and is connected to the phylogenetic distance in C by dotted lines. The data are best fit by two decay curves, one for primates with a slow rate of change and the other for horse to medaka. The curves shown are the fits to the data points from horse to medaka. (Statistics and coefficients for these fits are in Supplemental Table S4.)

Similar articles

Cited by

References

    1. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D., Stephen S., Kent W.J., Mattick J.S., Haussler D., Kent W.J., Mattick J.S., Haussler D., Mattick J.S., Haussler D., Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. - PubMed
    1. Blanchette M., Kent W.J., Riemer C., Elnitski L., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Kent W.J., Riemer C., Elnitski L., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Riemer C., Elnitski L., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Elnitski L., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Smit A.F.A., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Roskin K.M., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Baertsch R., Rosenbloom K., Clawson H., Green E.D., Rosenbloom K., Clawson H., Green E.D., Clawson H., Green E.D., Green E.D., et al. Aligning multiple genomic sequences with the Threaded Blockset Aligner. Genome Res. 2004;14:708–715. - PMC - PubMed
    1. Blanchette M., Bataille A.R., Chen X., Poitras C., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Bataille A.R., Chen X., Poitras C., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Chen X., Poitras C., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Poitras C., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., Deblois G., Giguere V., Ferretti V., Bergeron D., Giguere V., Ferretti V., Bergeron D., Ferretti V., Bergeron D., Bergeron D., et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006;16:656–668. - PMC - PubMed
    1. Blankenberg D., Taylor J., Schenck I., He J., Zhang Y., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., Taylor J., Schenck I., He J., Zhang Y., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., Schenck I., He J., Zhang Y., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., He J., Zhang Y., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., Zhang Y., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., Ghent M., Veeraaghavan N., Albert I., Miller W., Makova K., Veeraaghavan N., Albert I., Miller W., Makova K., Albert I., Miller W., Makova K., Miller W., Makova K., Makova K., et al. A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Res. 2007;17:960–964. - PMC - PubMed
    1. Diallo A.B., Makarenkov V., Blanchette M., Makarenkov V., Blanchette M., Blanchette M. Exact and heuristic algorithms for the Indel Maximum Likelihood problem. J. Comput. Biol. 2007;14:446–461. - PubMed

Publication types

LinkOut - more resources