Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002;3(12):RESEARCH0085.
doi: 10.1186/gb-2002-3-12-research0085. Epub 2002 Dec 31.

Heterochromatic sequences in a Drosophila whole-genome shotgun assembly

Affiliations
Comparative Study

Heterochromatic sequences in a Drosophila whole-genome shotgun assembly

Roger A Hoskins et al. Genome Biol. 2002.

Abstract

Background: Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly.

Results: WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm.

Conclusions: Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Chromosome structure of Drosophila melanogaster. The left and right arms of chromosomes 2 (2L, 2R) and 3 (3L, 3R), the small chromosome 4, and the sex chromosomes X and Y are shown (adapted from [12]). The numbers correspond to lengths in megabases. The euchromatic portions of the chromosome arms (white) correspond to the Release 3 euchromatic sequence described in Celniker et al. [47]. The lengths of the heterochromatic portions of the chromosome arms (green) are estimated from measurements of mitotic chromosomes [76]. The length of the heterochromatin on the X chromosome is polymorphic among strains and can comprise from one-third to one-half of the length of the mitotic chromosome. Our cytogenetic experiments show that Release 3 euchromatic sequence (white) extends into the centric heterochromatin by approximately 2.1 Mb (see Results).
Figure 2
Figure 2
Distribution of scaffold lengths in the WGS3 heterochromatic sequence. (a) Histogram of the number (indicated above each bar) of sequence scaffolds in each of the indicated size ranges (kb). (b) Histogram of the sequence total (Mb; indicated above each bar) represented in the scaffolds in each of the indicated size ranges (kb).
Figure 3
Figure 3
Comparison of WGS3 and Release 2 sequence assemblies of the rolled region. (a) Genomic organization of the rolled gene. Exons are shown as black boxes numbered 1 to 7, and introns are shown by the thin black line. All exons are present in a single WGS3 scaffold; exons 4 and 7 are absent from Release 2. (b) Thirteen Release 2 sequence scaffolds are shown as red bars. Thick portions of bars show regions aligned to WGS3, and thin portions show unaligned regions corresponding to sequence gaps. Scaffolds are labeled with the GenBank accession numbers, all of which begin 'AE00' and end in the indicated four digits: for example, AE003202. (c) The 252-kb WGS3 heterochromatic sequence scaffold 211000022279977 (Scaffold 79977), shown by the blue bar, links the 13 Release 2 scaffolds. The thin portions of the bar represent sequence gaps.
Figure 4
Figure 4
Evidence supporting the gene models. The diagram shows the numbers of curated gene models supported by evidence in three classes. Sequence alignments to Drosophila ESTs and, in parentheses, full-insert cDNA sequences were determined using sim4 (yellow circle). Gene predictions were made using Genie and Genscan (red circle). Similarities to known and predicted genes and proteins in Drosophila and other organisms were determined using BLASTX and TBLASTX (blue circle). The intersections in the diagram show the number of gene models that are supported by multiple evidence types. For example, there are 80 models supported by all three types of evidence, and 46 of these are represented by full-insert cDNA sequence. The 89 gene models supported only by gene prediction include 22 models that were predicted only in the masked sequence.
Figure 5
Figure 5
Gene models with weak similarity to transposable elements (TE). The percentage of gene models with TBLASTX similarity to known transposable elements at E-values from 1 × 10-2 to 1 × 10-10 is shown. The data are very similar for the Release 3 euchromatic (blue line) and WGS3 heterochromatic (pink line) annotations. The eight (2.7%) curated gene models in the WGS3 heterochromatin with E-values ≤ 1 × 10-10 were examined further, as described in the text.
Figure 6
Figure 6
Annotation of the light (lt) region. The 594-kb WGS3 scaffold 211000022280798 and twelve curated gene models are shown. The WGS3 sequence is shown as a bar with sequence gaps (black), transposable elements and simple repeats that were masked and removed during the annotation process (red), and presumed single-copy sequences that remained after masking (gray) indicated. Gene models are shown as blue bars with exons (thick) and introns (thin) indicated. Those above the line are transcribed on the forward strand, and those below the line are transcribed on the reverse strand. The average density of curated genes is one per 50 kb, about six- to sevenfold lower than the density in the euchromatin [12,48]. Only the lt and cta genes are identified by genetic analysis. Seven gene models were described in the Release 1 annotation [12]. This annotation provides a more accurate view of the structures of nearly all of the gene models and determines their relative locations. cDNA sequence alignment allowed us to merge two Release 2 gene models, Chitinase 1 and 3, into a single gene Cht3 with multiple chitin-binding and catalytic domains. Two of the three new curated genes (CG40006, CG40016) are represented by multiple ESTs and cDNAs. CG40005 is based solely on BLAST evidence; its similarity to the adjacent cta gene suggests a possible sequence assembly artifact. On the basis of the masking results, known transposable element sequences account for 302 kb (51%) of the sequence scaffold.
Figure 7
Figure 7
Annotation of the rolled gene. Results from the computational annotation pipeline for the portion of WGS3 scaffold 211000022279977 containing the rolled gene are displayed in Apollo. Evidence (black panels) used to annotate gene models (light blue panels) is shown. Evidence for gene models includes alignments of BLASTX results (red), cDNA sequences (green), and results of gene prediction (lavender). The curated structure of two rolled transcript models is defined by cDNA sequences. The Release 2 annotation (blue) did not include a complete rolled gene model. The predicted start (green) and stop (red) codons are indicated in the gene models. (a) In the annotation of the WGS3 masked sequence, in which transposable elements were removed, the Genscan prediction and the BLASTX results are consistent with the curated gene structure. The BLASTX evidence in the 3' intron of rolled identifies an unmasked transposable element (yellow arrow). (b) In the unmasked WGS3 sequence, which includes known transposable elements (purple), Genscan fails to predict the first five exons of rolled, predicts two gene models within transposable elements, and adds three spurious exons to an inaccurate rolled gene model.
Figure 8
Figure 8
The boundaries of the centric heterochromatin defined by FISH. BACs near the centric ends of the Release 3 chromosome arm sequence spanning the euchromatin [47] were localized by FISH to mitotic chromosomes to correlate the cytological boundaries of the centric heterochromatin with the genomic sequence (see Materials and methods). (a) Results for chromosome 3 are shown. Locations of BACs on the cytogenetic map (bands h1-61) are indicated by arrows. The left (3L) and right (3R) arms, and the centromere (C), are indicated. BAC names are indicated below each image, and images are oriented with the left arm at the top. See Table 1 for complete BAC names and additional information. (b) An example of the quantitative analysis used to determine BAC locations (red and green) relative to the DAPI (blue) banding pattern is shown (see Materials and methods).

Similar articles

Cited by

References

    1. Heitz E. Das Heterochromatin der Moose. I Jahrb Wiss Botanik. 1928;69:762–818.
    1. John B. The biology of heterochromatin. In: Verma RS, editor. In Heterochromatin: Molecular and Structural Aspects. Cambridge: Cambridge University Press; 1988. pp. 1–147.
    1. Elgin SC, Workman JL. Chromosome and expression mechanisms: a year dominated by histone modifications, transitory and remembered. Curr Opin Genet Dev. 2002;12:127–129. - PubMed
    1. Weiler KS, Wakimoto BT. Heterochromatin and gene expression in Drosophila. Annu Rev Genet. 1995;29:577–605. - PubMed
    1. Gatti M, Pimpinelli S. Functional elements in Drosophila melanogaster heterochromatin. Annu Rev Genet. 1992;26:239–275. - PubMed

Publication types