Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 1;26(9):1145-51.
doi: 10.1093/bioinformatics/btq102. Epub 2010 Mar 5.

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Affiliations

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Manfred G Grabherr et al. Bioinformatics. .

Abstract

Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes).

Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous 'battleship'-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine.

Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
MizBee is a multiscale synteny browser that interfaces with Satsuma to enable efficient exploration of conserved syntenic data (Meyer et al., 2009). Shown here are results from Satsuma on the stickleback–pufferfish dataset. On the left is the genome view, where the stickleback genome is shown in the outer ring, and the pufferfish genome on the inner ring along with the user-selected chromosome 1 from the stickleback genome. The connecting edges indicate the location of conserved syntenic blocks between the two species, and the edge color is determined based on the linked pufferfish chromosome. In the middle of the window is the chromosome view that provides details about the size and location of the syntenic blocks for the selected stickleback chromosome, along with the average similarity score for each block shown in the histogram to the right of the bar. The rightmost view shows a user-selected syntenic block, where information about the similarity, orientation, location and size of conserved features within the block are shown. The three views are linked using variety of mechanisms, such as color, interaction and highlighting, and users interactively select chromosomes and blocks using either the mouse or keyboard. MizBee, and the shown Satsuma dataset, can be freely downloaded from http://mizbee.org.
Fig. 2.
Fig. 2.
Specificity and sensitivity. (A) Specificity: Satsuma's match probability model predicts that a single, gap-free alignment of given length, identity and GC/AT composition is found by random chance given the target genome size (lizard, 1.7 Gb) in light colors (y-axis), over the match identity (x-axis). This is compared to a Null model (see Section 2) shown in dark colors. (B) Sensitivity: identical sequences of different lengths (50, 100, 150 and 200 bp) were inserted into non-homologous DNA from the human genome and mutated by randomly changing bases over the entire region (base substitution rate, x-axis, top), resulting in a decrease in sequence identity (x-axis, bottom). Each bar indicates that the alignment was correctly identified by Satsuma given the length of the sequence to be found over increasing mutation rates.
Fig. 3.
Fig. 3.
(A) Synteny between pufferfish chromosome 1 (x-axis) and stickleback chromosomes a (blue), b (red) and c (black), (B) Zoom-in into region depicting the search space (white) overlaid with the matches (red and blue).
Fig. 4.
Fig. 4.
Alignment of two grass genomes: Oryza sativa (x-axis, target, one graph per chromosome), and Sorghum bicolor (y-axis, query, chromosomes are color-coded). The plots show re-arrangements and large-scale segmental duplications in both genomes.
Fig. 5.
Fig. 5.
Comparison of Satsuma and blastz chained alignments (http://genome.ucsc.edu). Datasets: (A) human/dog (∼60 Myr apart, Lindblad-Toh et al., 2005); (B) human/opossum (∼130 Myr apart, Mikkelsen et al., 2007); and (C) human/chicken (∼350 Myr apart, International Chicken Genome Sequencing Consortium, 2004). Shown are the blastz-only bases (black) and Satsuma-only bases (red), i.e. bases in alignments that were only found by either one alignment program and not adjacent or overlapping with matches found by the other program. Bases in overlaps (i.e. both aligners found the same bases) are shown in light gray, and bases in alignments adjacent to alignments found by both aligners are shown in dark gray and pink.

Similar articles

Cited by

References

    1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Bellman R. Dynamic Programming. Princeton, NJ: Dover paperback edition 2003. Princeton University Press; 1957.
    1. Brodzik AK. A comparative study of cross-correlation methods for alignment of DNA sequences containing repetitive patterns. 13th European Signal Processing Conference EU-SIPCO 2005. 2005 Available at http://www.eurasip.org/Proceedings/Eusipco/Eusipco2005/defevent/papers/c....
    1. Chiaromonte F, et al. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002;115:26. - PubMed
    1. Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19:297–301.

Publication types