Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jun 18:8:176.
doi: 10.1186/1471-2148-8-176.

Evolution of C2H2-zinc finger genes and subfamilies in mammals: species-specific duplication and loss of clusters, genes and effector domains

Affiliations
Comparative Study

Evolution of C2H2-zinc finger genes and subfamilies in mammals: species-specific duplication and loss of clusters, genes and effector domains

Hamsa D Tadepally et al. BMC Evol Biol. .

Abstract

Background: C2H2 zinc finger genes (C2H2-ZNF) constitute the largest class of transcription factors in humans and one of the largest gene families in mammals. Often arranged in clusters in the genome, these genes are thought to have undergone a massive expansion in vertebrates, primarily by tandem duplication. However, this view is based on limited datasets restricted to a single chromosome or a specific subset of genes belonging to the large KRAB domain-containing C2H2-ZNF subfamily.

Results: Here, we present the first comprehensive study of the evolution of the C2H2-ZNF family in mammals. We assembled the complete repertoire of human C2H2-ZNF genes (718 in total), about 70% of which are organized into 81 clusters across all chromosomes. Based on an analysis of their N-terminal effector domains, we identified two new C2H2-ZNF subfamilies encoding genes with a SET or a HOMEO domain. We searched for the syntenic counterparts of the human clusters in other mammals for which complete gene data are available: chimpanzee, mouse, rat and dog. Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes within homologous mammalian clusters, suggesting differential patterns of evolution. Phylogenetic analysis of selected clusters reveals that the disparity in C2H2-ZNF gene repertoires across mammals not only originates from differential gene duplication but also from gene loss. Further, we discovered variations among orthologs in the number of zinc finger motifs and association of the effector domains, the latter often undergoing sequence degeneration. Combined with phylogenetic studies, physical maps and an analysis of the exon-intron organization of genes from the SCAN and KRAB domains-containing subfamilies, this result suggests that the SCAN subfamily emerged first, followed by the SCAN-KRAB and finally by the KRAB subfamily.

Conclusion: Our results are in agreement with the "birth and death hypothesis" for the evolution of C2H2-ZNF genes, but also show that this hypothesis alone cannot explain the considerable evolutionary variation within the subfamilies of these genes in mammals. We, therefore, propose a new model involving the interdependent evolution of C2H2-ZNF gene subfamilies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart overview of the approach used in the study.
Figure 2
Figure 2
Distribution of all the singletons and clustered genes from the various human C2H2-ZNF sub-families and gene composition of the C2H2-ZNF clusters. A) The number of genes belonging to the various C2H2-ZNF subfamilies are shown as well as the proportion of genes found as singletons or as part of clusters. C2H2-ZNF genes associated with KRAB and SCAN domains are more often found to be clustered. S-K = C2H2-ZNF containing both a SCAN and a KRAB domain. NONE = C2H2-ZNF without any conserved domain associated. The percentage distribution is mentioned on top of each bar for each sub-family. B) The number of C2H2-ZNF clusters is shown with respect to the number of genes present in each cluster. The proportion of clusters composed solely of C2H2-ZNF without any intervening gene or with intervening genes other than C2H2-ZNF (Non-C2H2-ZNF) is also represented. An asterisk identifies large clusters present on human chromosome 19.
Figure 3
Figure 3
Differential expansion and loss of C2H2-ZNF clusters in five mammalian genomes. A) Evolution of the C2H2-ZNF repertoires in primates, rodents and dog. The number of C2H2-ZNF clusters and the total number of C2H2-ZNF genes found in these clusters are mentioned on the species tree. Since Xenopus laevis and Gallus gallus C2H2-ZNF genes are used as an outgroup in phylogenetic studies, these species are also positioned on the tree. B) A graphical representation of different scenarios seen in the evolution of human C2H2-ZNF clusters and their syntenically homologous C2H2-ZNF clusters in chimpanzee, mouse, rat and dog. The human clusters selected and named on the graph as well as their syntenic counterparts were 1) present in all species, 2) primate-specific, 3) lost in rodents or 4) absent in dog. For each human C2H2-ZNF cluster named on the graph and described in Additional File 3, the first number indicates the chromosome number and the second is the number attributed to that cluster on the chromosome. Additional File 5 provides a more comprehensive graphical representation including the 40 human clusters that contain at least 3 C2H2-ZNF genes and their syntenic counterparts in the four other mammals.
Figure 4
Figure 4
Evolutionary scenarios in phylogenetic trees. The different kinds of evolutionary scenarios seen in phylogenetic trees are shown. A) Species tree showing the evolutionary relationship between the species, 1, 2, 3 and 4. B) A species-specific gain of genes appears as a clade including a single homolog from one species and multiple homologs from the other. Phylogeny between genes from species 1, 2, 3 and 4, respectively is shown. Gene gain in species 4 is observed. C) Species-specific gene loss appears as the absence of a corresponding ortholog for one species on the tree and is deduced from the evolutionary relationships of the species considered with the other species. Gene loss occurred in species 3.
Figure 5
Figure 5
Phylogenetic analysis of C2H2-ZNF genes in human cluster 19.12 and its syntenic counterparts in other mammals. A phylogenetic tree was built using the amino acid sequences corresponding to the zinc finger regions of the various human C2H2-ZNF genes from cluster 19.12 and their syntenic counterparts in chimpanzee, mouse, rat and dog. The tree was generated using a maximum likelihood method (RaxML) and verified using a bayesian method (MrBayes). 346 sites from 101 sequences (including the 20 outgroup sequences from chicken and Xenopus) were used in the analysis. The tree is divided into three major Groups (I-III). A tabulation of the number of genes present in each group is indicated for each species (h: human, p: chimpanzee, m: mouse, r: rat, c: dog). The bootstraps values are indicated for each node on the tree. A small black circle is also represented at each node in cases where the posterior probability value is equal to 1.00. This cluster contains only C2H2-ZNF genes that are either from the KRAB subfamily or that do not encode any conserved N-terminal domain. Next to the name of each C2H2-ZNF gene, the presence of an N-terminal KRAB domain is indicated by a K and number of zinc finger motifs is mentioned. A clear evidence of differential expansion is seen in primates and dog. Loss of C2H2-ZNF in the rodent lineage is also observed.
Figure 6
Figure 6
Physical maps showing the organization of the human C2H2-ZNF genes from cluster 19.12 localized on 19q13.4 and its syntenically homologous counterparts in other mammals. For the large C2H2-ZNF cluster 19.12 and its syntenically homologous counterparts in chimpanzee, mouse, rat and dog, each C2H2-ZNF gene is represented by an open arrow which indicates its orientation on the chromosome strands (this excludes the pseudogenes whose names appear in parenthesis). The presence of a conserved N-terminal KRAB domain is indicated by a square positioned in front of the open arrow representing the gene. Genes identified as orthologs, based on the phylogenetic tree and physical maps, are underlined and are aligned vertically on their respective chromosomes. Dotted lines separate the genes belonging to Group I, Group II and Group III defined in the phylogenetic tree (Figure 5). The two species-specific groups from dog and primates are seen in Group I and Group II, respectively.
Figure 7
Figure 7
Variation in the numbers of zinc finger motifs in mammals and in the presence of conserved N-terminal domains in orthologs. A) The average number of zinc finger motifs was calculated for all the C2H2-ZNF genes from the 81 human clusters identified and their corresponding syntenically homologous clusters in the other mammals. For each species, the average number of zinc finger motifs for the total C2H2-ZNF genes (All) and for members of the various C2H2-ZNF sub-families is presented. For each category, the number of genes in each species is listed above the bars in the following order (human, chimpanzee, mouse, rat and dog). B) For the human C2H2-ZNF cluster 6.2 (chromosome 6p22.1) and its syntenically homologous counterparts in chimpanzee, mouse, rat and dog, each C2H2-ZNF genes is represented by an open arrow which indicates its orientation on the chromosome strands; this excludes the pseudogenes whose names appear in parenthesis. For these clusters which contain C2H2-ZNF genes that are from the KRAB or SCAN subfamily or that do not encode any conserved N-terminal domain, the presence of a conserved N-terminal is indicated by as square for a KRAB domain or an open circle for a SCAN domain both being positioned in front of the open arrow representing the gene. Genes identified as orthologs, based on the phylogenetic tree and physical maps, are aligned vertically on their respective chromosomes. Cases where orthologs from the different mammals do not consistently share the same effector domain (s) are marked by a grey box. C) Exon-Intron organization of most human C2H2-ZNF genes from the SCAN-KRAB and SCAN subfamilies. 80% of SCAN-KRAB (11/14) and 55% of the SCAN (16/29) C2H2-ZNF genes found in clusters in human have the presented exon-intron structures shown. The exons encoding the SCAN, KRAB (A box) and ZNF are indicated.
Figure 8
Figure 8
Model for the evolution of the SCAN, SCAN-KRAB and KRAB C2H2-ZNF subfamilies. A) Sequential events leading to the birth of SCAN and SCAN-KRAB and KRAB C2H2-ZNF subfamilies. Most of the SCAN, SCAN-KRAB and KRAB C2H2-ZNF genes have the exon-intron structure shown (boxes represent exons). Birth of new families may have occurred presumably by an exon shuffling mechanism leading first to the acquisition of a SCAN domain by a C2H2-ZNF gene and later of a KRAB domain by a SCAN C2H2-ZNF gene. In most SCAN-KRAB C2H2-ZNF genes, a single exon is found in between the exon encoding the KRAB A box (identified as KRAB) and the exon encoding the zinc finger domain (ZNF). This exon encodes in most instances the so-called KRAB B, b, or C boxes. KRAB C2H2-ZNF subfamily emergence involved the loss of the SCAN domain from SCAN-KRAB gene (s). B) Dynamic evolution of C2H2-ZNF genes after birth of the SCAN and SCAN-KRAB subfamilies through gene duplication and recurrent loss of effector domains. A first SCAN C2H2-ZNF gene appeared in an ancestor of vertebrates following the gain of a SCAN domain by a C2H2-ZNF gene; duplication then led to the establishment of the SCAN C2H2-ZNF subfamily. The gain of a KRAB domain at the emergence of tetrapods by a SCAN C2H2-ZNF gene gave rise to a SCAN-KRAB C2H2-ZNF gene. This was followed by duplication and establishment of the SCAN-KRAB subfamily. Loss of the SCAN domain by deletion or sequence degeneration from some SCAN-KRAB C2H2-ZNF genes followed in many instances by duplication of the resulting KRAB C2H2-ZNF genes led to the expansion of the KRAB C2H2-ZNF subfamily. Loss of SCAN or KRAB domains by deletion or degeneration from SCAN, SCAN-KRAB and KRAB C2H2-ZNF subfamilies is seen as a recurrent theme shaping the repertoires of the C2H2-ZNF subfamilies.

Similar articles

Cited by

References

    1. Thornton JW, DeSalle R. Gene family evolution and homology: genomics meets phylogenetics. Annu Rev Genomics Hum Genet. 2000;1:41–73. doi: 10.1146/annurev.genom.1.1.41. - DOI - PubMed
    1. Venter JC. et al.The sequence of the human genome. Science. 2001;291(5507):1304–51. doi: 10.1126/science.1058040. - DOI - PubMed
    1. Ohta T. Evolution of gene families. Gene. 2000;259(1–2):45–52. doi: 10.1016/S0378-1119(00)00428-5. - DOI - PubMed
    1. Lander ES. et al.Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Messina DN. et al.An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 2004;14(10B):2041–7. doi: 10.1101/gr.2584104. - DOI - PMC - PubMed

Publication types

LinkOut - more resources