Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2000 Aug;10(8):1148-57.
doi: 10.1101/gr.10.8.1148.

Alfresco--a workbench for comparative genomic sequence analysis

Affiliations
Comparative Study

Alfresco--a workbench for comparative genomic sequence analysis

N Jareborg et al. Genome Res. 2000 Aug.

Abstract

Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Alfresco user interface. (A) The display area shows a representation of two EMBL sequence entries of the orthologous mouse and human keratin 18 genes (mouse: ID MMENDOBA, AC M22832; human: ID HSKER101, AC M24842). Conserved regions are connected by gray lines. Dark yellow boxes represent conserved regions found by BLASTN/MSPcrunch, boxes in different shades of blue represent conserved regions found by DBA (see text). Gene structures are represented by blue borders and a name tag, with “>” and “<” indicating the direction of transcription. Coding and noncoding exons are indicated in green and yellow, respectively. The shown gene structures are the result of analyses with Blastwise, blEst_genome, and Genscan, respectively. Sequence repeats found by RepeatMasker are indicated by boxes in blue (LINEs), green (SINEs), brown (low complexity), and pink (simple repeats). CpG islands predicted by Cpg are shown as yellow boxes around the start of the first introns. The attributes of selected features (colored red) can be manually edited through a dialog window (seen on the right). (B) Subregions of the sequences can be selected and subjected to further analysis. To the right of the main window are shown a window displaying alignments of conserved regions found by DBA and a Dotter window of the selected regions. Conserved blocks found by DBA can be added to the display as features of the entries. Positions of features defined in Alfresco is exported to Dotter and displayed along the sides of the dot plot. (C) Regions similar to any other region can be identified using the “Region set parameters” dialog. These regions are displayed as dark gray boxes. The Cutoff scrollbar selects the threshold of similarity. Clicking on a region (the red box in the figure) will indicate which other regions belong to the same set by displaying blue boxes above the sequence representations. A selected Region set can be permanently added to the display with a choice of colors.
Figure 2
Figure 2
Analysis of the mouse and human UTY genomic regions. (A) Overall view of the region. The mouse sequence (AC006508) is shown at the top and the human sequence (AC006376) at the bottom. Gene structures are the result of analysis with blEst_genome and Blastwise. The sequence IDs of the RNA and protein entries from EMBL, SWISSPROT, and TREMBL that have been aligned to the genomic sequences are shown above the gene structure representations. Regions of similarity detected by BLASTN/MSPcrunch and DBA are connected by gray lines. Regions covered by repeats are represented as colored boxes on the lines representing the sequences. (B) Exons 11–16. Exon positions are derived from GeneWise alignments of the mouse and human protein sequences (Swissprot id: UTY_MOUSE and UTY_HUMAN) with the two genomic sequences as indicated. Dark-yellow blocks connected by gray lines are conserved regions found by BLASTN/MSPcrunch, blue boxes are regions found by DBA. The mouse sequence uses exon 13 but not exons 15 and 16. The human sequence uses exons 15 and 16 but not exon 13. Exon 13 is conserved in both species and parts of exon 16 is also conserved in both species, whereas exon 15 is not. The conservation of these exons indicates the possibility of alternatively spliced transcripts in both species. (C) Conservation of noncoding regions of the 5′ end of the UTY gene. Blocks in different shades of blue connected by gray lines are conserved regions found by DBA. The upstream region contains one conserved block upstream of a possible TATA box (short light-blue region). The 5′ UTR contains two conserved blocks, and the first and second introns contain one conserved region each that are separate from the conservation seen around the exons. These blocks contain several conserved transcription factor-binding sites (see text). A predicted CpG island represented by a yellow box covers parts of the first exon and intron in the human sequence.

Similar articles

Cited by

References

    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. - PubMed
    1. Altschul SF, Madden TL, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. .1995–1199. - PMC - PubMed
    1. Barillot E, Leser U, Lijnzaad P, Cussat-Blanc C, Jungfer K, Guyon F, Vaysseix G, Helgesen C, Rodriguez-Tome P. A proposal for a standard CORBA interface for genome maps. Bioinformatics. 1999;15:157–169. - PubMed
    1. Bederson, B.B. and Hollan, J.D. 1994. Pad++: A zooming graphical interface for exploring alternate interface physics. Proceedings of the ACM Symposium on User Interface Software and Technology, November 2–4, 1994, Marina del Ray, CA USA. p. 17–26.
    1. Birney, E. and Durbin, R. 1997. Wise2. http://www.sanger.ac.uk/Software/Wise2.

Publication types