Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 May 15;30(10):2212-23.
doi: 10.1093/nar/30.10.2212.

Connected gene neighborhoods in prokaryotic genomes

Affiliations

Connected gene neighborhoods in prokaryotic genomes

Igor B Rogozin et al. Nucleic Acids Res. .

Abstract

A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon 'genomic hitchhiking'. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A flow chart of the procedure for construction of arrays and clusters from conserved gene pairs. Colored arrows indicate COGs that form conserved pairs and open arrows indicate COGs or non-COG genes that do not form conserved pairs, but are allowed to insert between genes in a pair.
Figure 2
Figure 2
Representation of COGs of the same functional category and of different functional categories in conserved gene pairs.
Figure 3
Figure 3
A cluster of gene arrays presented as a directed graph. Nodes correspond to COGs, the COG numbers are indicated inside the circles. The edges show conserved gene pairs and the direction of transcription of the corresponding genes is shown by arrows. The blue circles and red arrows show the depicted cluster. The open circles and black arrows show genes and gene pairs that are linked to individual COGs in the given cluster, but did not join it under the procedure employed. The number of genomes in which the given pair is represented is given for each edge, and the thickness of the edge is roughly proportional to this number. This example shows cluster 14. The rank of the cluster (neighborhood) in this and other figures was determined by the descending order of the number of genes (COGs) in the core cluster as shown in Table 1. COG0130, pseudouridine synthase; COG0184, ribosomal protein S15P/S13E; COG0195, transcription elongation/anti-termination factor (NusA); COG0196, FAD synthase; COG0532, translation initiation factor 2 (GTPase); COG0612, predicted Zn-dependent peptidase; COG0779, uncharacterized conserved protein; COG0858, ribosome-binding factor A; COG1185, polyribonucleotide nucleotidyltransferase; COG2740, uncharacterized conserved protein.
Figure 4
Figure 4
Distribution of clusters of gene arrays by the number of genes (A) and species (B).
Figure 4
Figure 4
Distribution of clusters of gene arrays by the number of genes (A) and species (B).
Figure 5
Figure 5
Fragments of the ribosomal protein gene neighborhood 1 containing apparent hitchhiker genes. Colored arrows indicate the COGs that belong to the ribosomal protein gene neighborhood; open arrows indicate inserted genes. (A) The gene for the glycolytic enzyme enolase is part of the ribosomal protein gene cluster in Euryarchaeota. COG0102, large subunit ribosomal protein L13; COG0103, small subunit ribosomal protein S9; COG1644, DNA-directed RNA polymerase, subunit N; COG1758, DNA-directed RNA polymerase, subunit K; COG0148, enolase. (B) Proteobacterial ribosomal protein cluster includes genes for stringent starvation response proteins, which appear to be functionally linked to translation, and genes for electron transfer chain components, probable hitchhikers. COGs absent in (A): COG0723, Rieske Fe-S cluster protein; COG1290, cytochrome b subunit of the bc complex; COG2857, cytochrome c1; COG0625, stringent starvation protein A (glutathione S-transferase); COG2969, stringent starvation protein B; COG0583, transcriptional regulator; COG0327, uncharacterized conserved protein.
Figure 6
Figure 6
Neighborhood 20: unexpected connection between the Holliday junction resolvasome and protein translocase. COG0217, uncharacterized conserved protein; COG0817, endonuclease subunit of the resolvasome; COG2255, helicase subunit of the resolvasome; COG0809, queuine-tRNA-ribosyltransferase (QueA); COG0343, queuine-tRNA-ribosyltransferase (Tgt); COG1862, COG0342, COG0341, subunits of protein translocase (the Sec complex); COG0425, predicted regulator of disulfide bond formation; COG3158, potassium transporter; COG1826, component of a Sec-independent protein secretion pathway; COG0805, component of a Sec-independent protein secretion pathway.
Figure 7
Figure 7
Probable gene hitchhiking: independent incorporation of the FAD synthase gene in two translation-related neighborhoods. (A) Neighborhood 14: The list of COGs is as in Figure 3. (B) Neighborhood 50: COG0728, uncharacterized membrane protein; COG0196, FAD synthase; COG0060, isoleucyl-tRNA synthetase; COG0597, lipoprotein signal peptidase; COG1047, FKBP-like peptidyl-prolyl cistrans isomerase; COG0761, membrane protein, penicillin tolerance determinant.
Figure 7
Figure 7
Probable gene hitchhiking: independent incorporation of the FAD synthase gene in two translation-related neighborhoods. (A) Neighborhood 14: The list of COGs is as in Figure 3. (B) Neighborhood 50: COG0728, uncharacterized membrane protein; COG0196, FAD synthase; COG0060, isoleucyl-tRNA synthetase; COG0597, lipoprotein signal peptidase; COG1047, FKBP-like peptidyl-prolyl cistrans isomerase; COG0761, membrane protein, penicillin tolerance determinant.
Figure 8
Figure 8
Apparent operon car-pooling: association of diverse functional themes (neighborhood 13). COG0481, membrane-associated GTPase; COG0681, signal peptidase; COG0571, RNase III; COG1159, ribosome-associated GTPase; COG1381, recombinational repair pathway component; COG0854, enzyme of pyridoxal phosphate biosynthesis; COG0621, 2-methylthioadenine synthetase; COG1702, predicted ATPase involved in phosphate regulon regulation; COG1480, predicted hydrolase of the HD family; COG0319, predicted metal-dependent hydrolase; COG1253, CBS-domain-containing protein; COG0818, diacylglycerol kinase; COG0295, cytidine deaminase.
Figure 9
Figure 9
A tentative scenario for the evolution of a gene neighborhood. The example analyzed is neighborhood 14, which is shown in Figure 7A. The designations of the genes are as in Figure 7A. The four types of postulated evolutionary events are shown by color-coded circles.

Similar articles

Cited by

References

    1. Jacob F., Perrin,D., Sanchez,C. and Monod,J. (1960) L’Operon: groupe de genes a expression coordonee par un operateur. C. R. Acad. Sci., 250, 1727–1729. - PubMed
    1. Jacob F. and Monod,J. (1961) Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol., 3, 318–356. - PubMed
    1. Miller J.H. and Reznikoff,W.S.E. (1978) The Operon. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
    1. Mushegian A.R. and Koonin,E.V. (1996) Gene order is not conserved in bacterial evolution. Trends Genet., 12, 289–290. - PubMed
    1. Dandekar T., Snel,B., Huynen,M. and Bork,P. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci., 23, 324–328. - PubMed