Genome-wide synteny through highly sensitive sequence alignment: Satsuma

doi:10.1093/bioinformatics/btq102

. 2010 May 1;26(9):1145-51.

doi: 10.1093/bioinformatics/btq102. Epub 2010 Mar 5.

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Manfred G Grabherr¹, Pamela Russell, Miriah Meyer, Evan Mauceli, Jessica Alföldi, Federica Di Palma, Kerstin Lindblad-Toh

Affiliations

PMID: 20208069
PMCID: PMC2859124
DOI: 10.1093/bioinformatics/btq102

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Manfred G Grabherr et al. Bioinformatics. 2010.

. 2010 May 1;26(9):1145-51.

doi: 10.1093/bioinformatics/btq102. Epub 2010 Mar 5.

Authors

Manfred G Grabherr¹, Pamela Russell, Miriah Meyer, Evan Mauceli, Jessica Alföldi, Federica Di Palma, Kerstin Lindblad-Toh

Affiliation

¹ Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA. grabherr@broadinstitute.org

PMID: 20208069
PMCID: PMC2859124
DOI: 10.1093/bioinformatics/btq102

Abstract

Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes).

Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous 'battleship'-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine.

Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/.

PubMed Disclaimer

Figures

**Fig. 1.**
MizBee is a multiscale synteny browser that interfaces with Satsuma to enable efficient exploration of conserved syntenic data (Meyer *et al.*, 2009). Shown here are results from Satsuma on the stickleback–pufferfish dataset. On the left is the genome view, where the stickleback genome is shown in the outer ring, and the pufferfish genome on the inner ring along with the user-selected chromosome 1 from the stickleback genome. The connecting edges indicate the location of conserved syntenic blocks between the two species, and the edge color is determined based on the linked pufferfish chromosome. In the middle of the window is the chromosome view that provides details about the size and location of the syntenic blocks for the selected stickleback chromosome, along with the average similarity score for each block shown in the histogram to the right of the bar. The rightmost view shows a user-selected syntenic block, where information about the similarity, orientation, location and size of conserved features within the block are shown. The three views are linked using variety of mechanisms, such as color, interaction and highlighting, and users interactively select chromosomes and blocks using either the mouse or keyboard. MizBee, and the shown Satsuma dataset, can be freely downloaded from http://mizbee.org.

**Fig. 2.**
Specificity and sensitivity. (A) Specificity: Satsuma's match probability model predicts that a single, gap-free alignment of given length, identity and GC/AT composition is found by random chance given the target genome size (lizard, 1.7 Gb) in light colors (y-axis), over the match identity (x-axis). This is compared to a Null model (see Section 2) shown in dark colors. (B) Sensitivity: identical sequences of different lengths (50, 100, 150 and 200 bp) were inserted into non-homologous DNA from the human genome and mutated by randomly changing bases over the entire region (base substitution rate, x-axis, top), resulting in a decrease in sequence identity (x-axis, bottom). Each bar indicates that the alignment was correctly identified by Satsuma given the length of the sequence to be found over increasing mutation rates.

**Fig. 3.**
(A) Synteny between pufferfish chromosome 1 (x-axis) and stickleback chromosomes a (blue), b (red) and c (black), (B) Zoom-in into region depicting the search space (white) overlaid with the matches (red and blue).

**Fig. 4.**
Alignment of two grass genomes: *Oryza sativa* (x-axis, target, one graph per chromosome), and *Sorghum bicolor* (y-axis, query, chromosomes are color-coded). The plots show re-arrangements and large-scale segmental duplications in both genomes.

**Fig. 5.**
Comparison of Satsuma and blastz chained alignments (http://genome.ucsc.edu). Datasets: (A) human/dog (∼60 Myr apart, Lindblad-Toh *et al.*, 2005); (B) human/opossum (∼130 Myr apart, Mikkelsen *et al.*, 2007); and (C) human/chicken (∼350 Myr apart, International Chicken Genome Sequencing Consortium, 2004). Shown are the blastz-only bases (black) and Satsuma-only bases (red), i.e. bases in alignments that were only found by either one alignment program and not adjacent or overlapping with matches found by the other program. Bases in overlaps (i.e. both aligners found the same bases) are shown in light gray, and bases in alignments adjacent to alignments found by both aligners are shown in dark gray and pink.

See this image and copyright information in PMC

Cited by

Chromosome-Level Reference Genome Assembly for the American Pika (Ochotona princeps).
Sjodin BMF, Galbreath KE, Lanier HC, Russello MA. Sjodin BMF, et al. J Hered. 2021 Nov 1;112(6):549-557. doi: 10.1093/jhered/esab031. J Hered. 2021. PMID: 34036348 Free PMC article.
Whole genome resequencing and comparative genome analysis of three Puccinia striiformis f. sp. tritici pathotypes prevalent in India.
Yadav IS, Bhardwaj SC, Kaur J, Singla D, Kaur S, Kaur H, Rawat N, Tiwari VK, Saunders D, Uauy C, Chhuneja P. Yadav IS, et al. PLoS One. 2022 Nov 3;17(11):e0261697. doi: 10.1371/journal.pone.0261697. eCollection 2022. PLoS One. 2022. PMID: 36327308 Free PMC article.
fagin: synteny-based phylostratigraphy and finer classification of young genes.
Arendsee Z, Li J, Singh U, Bhandary P, Seetharam A, Wurtele ES. Arendsee Z, et al. BMC Bioinformatics. 2019 Aug 27;20(1):440. doi: 10.1186/s12859-019-3023-y. BMC Bioinformatics. 2019. PMID: 31455236 Free PMC article.
Rapid morphological divergence following a human-mediated introduction: the role of drift and directional selection.
Sendell-Price AT, Ruegg KC, Clegg SM. Sendell-Price AT, et al. Heredity (Edinb). 2020 Apr;124(4):535-549. doi: 10.1038/s41437-020-0298-8. Epub 2020 Feb 20. Heredity (Edinb). 2020. PMID: 32080374 Free PMC article.
A universal genomic coordinate translator for comparative genomics.
Zamani N, Sundström G, Meadows JR, Höppner MP, Dainat J, Lantz H, Haas BJ, Grabherr MG. Zamani N, et al. BMC Bioinformatics. 2014 Jun 30;15:227. doi: 10.1186/1471-2105-15-227. BMC Bioinformatics. 2014. PMID: 24976580 Free PMC article.

See all "Cited by" articles

References

1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
1. Bellman R. Dynamic Programming. Princeton, NJ: Dover paperback edition 2003. Princeton University Press; 1957.
1. Brodzik AK. A comparative study of cross-correlation methods for alignment of DNA sequences containing repetitive patterns. 13th European Signal Processing Conference EU-SIPCO 2005. 2005 Available at http://www.eurasip.org/Proceedings/Eusipco/Eusipco2005/defevent/papers/c....
1. Chiaromonte F, et al. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002;115:26. - PubMed
1. Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19:297–301.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed

[2] Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed

[3] Bellman R. Dynamic Programming. Princeton, NJ: Dover paperback edition 2003. Princeton University Press; 1957.

[4] Bellman R. Dynamic Programming. Princeton, NJ: Dover paperback edition 2003. Princeton University Press; 1957.

[5] Brodzik AK. A comparative study of cross-correlation methods for alignment of DNA sequences containing repetitive patterns. 13th European Signal Processing Conference EU-SIPCO 2005. 2005 Available at http://www.eurasip.org/Proceedings/Eusipco/Eusipco2005/defevent/papers/c....

[6] Brodzik AK. A comparative study of cross-correlation methods for alignment of DNA sequences containing repetitive patterns. 13th European Signal Processing Conference EU-SIPCO 2005. 2005 Available at http://www.eurasip.org/Proceedings/Eusipco/Eusipco2005/defevent/papers/c....

[7] Chiaromonte F, et al. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002;115:26. - PubMed

[8] Chiaromonte F, et al. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002;115:26. - PubMed

[9] Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19:297–301.

[10] Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19:297–301.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Affiliation

Genome-wide synteny through highly sensitive sequence alignment: Satsuma

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous