Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2015 Oct 19;112(45):13970–13975. doi: 10.1073/pnas.1515937112

Molecular signatures of plastic phenotypes in two eusocial insect species with simple societies

Solenn Patalano a,1, Anna Vlasova b,c, Chris Wyatt b,c,d, Philip Ewels a,e, Francisco Camara b,c, Pedro G Ferreira b,f,g, Claire L Asher h,i, Tomasz P Jurkowski j, Anne Segonds-Pichon a, Martin Bachman k,l, Irene González-Navarrete b,c, André E Minoche b,c,m, Felix Krueger a, Ernesto Lowy b,c, Marina Marcet-Houben b,c, Jose Luis Rodriguez-Ales b,c, Fabio S Nascimento n, Shankar Balasubramanian k,l,o, Toni Gabaldon b,c,p, James E Tarver q, Simon Andrews a, Heinz Himmelbauer b,c,r, William O H Hughes i,s, Roderic Guigó b,c, Wolf Reik a,t,u,1, Seirian Sumner d,h,1
PMCID: PMC4653166  PMID: 26483466

Significance

In eusocial insect societies, such as ants and some bees and wasps, phenotypes are highly plastic, generating alternative phenotypes (queens and workers) from the same genome. The greatest plasticity is found in simple insect societies, in which individuals can switch between phenotypes as adults. The genomic, transcriptional, and epigenetic underpinnings of such plasticity are largely unknown. In contrast to the complex societies of the honeybee, we find that simple insect societies lack distinct transcriptional differentiation between phenotypes and coherently patterned DNA methylomes. Instead, alternative phenotypes are largely defined by subtle transcriptional network organization. These traits may facilitate genomic plasticity. These insights and resources will stimulate new approaches and hypotheses that will help to unravel the genomic processes that create phenotypic plasticity.

Keywords: social evolution, phenotypic plasticity, genome sequencing, transcriptomes, DNA methylation

Abstract

Phenotypic plasticity is important in adaptation and shapes the evolution of organisms. However, we understand little about what aspects of the genome are important in facilitating plasticity. Eusocial insect societies produce plastic phenotypes from the same genome, as reproductives (queens) and nonreproductives (workers). The greatest plasticity is found in the simple eusocial insect societies in which individuals retain the ability to switch between reproductive and nonreproductive phenotypes as adults. We lack comprehensive data on the molecular basis of plastic phenotypes. Here, we sequenced genomes, microRNAs (miRNAs), and multiple transcriptomes and methylomes from individual brains in a wasp (Polistes canadensis) and an ant (Dinoponera quadriceps) that live in simple eusocial societies. In both species, we found few differences between phenotypes at the transcriptional level, with little functional specialization, and no evidence that phenotype-specific gene expression is driven by DNA methylation or miRNAs. Instead, phenotypic differentiation was defined more subtly by nonrandom transcriptional network organization, with roles in these networks for both conserved and taxon-restricted genes. The general lack of highly methylated regions or methylome patterning in both species may be an important mechanism for achieving plasticity among phenotypes during adulthood. These findings define previously unidentified hypotheses on the genomic processes that facilitate plasticity and suggest that the molecular hallmarks of social behavior are likely to differ with the level of social complexity.


Phenotypic plasticity allows organisms to maintain fitness in a changing environment. Plasticity influences organismal ecological resilience, adaptability, evolutionary innovations, and speciation (1, 2). However, we understand little about the molecular signatures (the genes involved and differential regulation thereof) of such plasticity. Determining the molecular basis of phenotypic plasticity is fundamental to our understanding of the building blocks of life and has the potential to uncover insights into selection for adaptive function and phenotypic innovation (35).

The profound action of evolution in the generation of biological diversity can be discerned from the genome (6). However, genome sequence alone is not sufficient to explain diverse phenotypic variation because such analyses infer associations based on gene evolution and gene sharing rather than directly identifying differentially expressed genes (DEGs) in the phenotypes of interest (7). Here, in addition to genome and microRNA (miRNA) sequencing, we use deep transcriptome and methylome sequencing of single brains from alternative phenotypes to determine the differential molecular processes associated with highly plastic phenotypes in two species of eusocial insects (8).

Hymenopteran eusocial insects exhibit enormous interspecific variation in phenotypic plasticity, in the form of reproductive (queen) and nonreproductive (worker) phenotypes (9), across multiple independent origins (10). Our two study species (the dinosaur ant Dinoponera quadriceps and the paper wasp Polistes canadensis) exhibit very simple societies, where individuals retain the ability to switch phenotype (11, 12). This characteristic contrasts with the adult honey bee Apis mellifera and most ants, which exhibit low levels of phenotypic plasticity and have been the focus of most previous molecular analyses (13). Our two study species share similar levels of plasticity among individuals, with a single reproductive egg-layer (“gamergate” in D. quadriceps and “queen” in P. canadensis) that is morphologically identical to the nonreproductives; if the reproductive dies, it is quickly replaced by one of the nonreproductives. Both species share many ecological traits but evolved social phenotypes independently (14, 15) (Dataset S1). As such, we present two independent studies on the molecular basis of highly plastic phenotypes in these simple societies (Fig. 1 A and B).

Fig. 1.

Fig. 1.

Genome sequencing and organization. P. canadensis (A) and D. quadriceps (B) share similar ecological, social, and behavioral traits (Dataset S1). (C) P. canadensis shares more similarity in predicted proteins with bees (Apidae) than ants (Formicidae), as expected, given the lack of other published aculeate wasp genome sequences; D. quadriceps shares greatest similarity of predicted protein sequences with sequenced ant genomes (Formicidae). These data are derived from computational protein analyses (SI Text, section SII.7). (D) Distribution of different classes of repetitive elements and transposons across P. canadensis and D. quadriceps genomes.

Our aims were threefold. First, we sequenced the genomes of P. canadensis and D. quadriceps to provide genomic baseline data for eusocial insect species with simple societies, including the first aculeate wasp genome sequence. Second, we sequenced and analyzed individual brain transcriptomes to identify differential transcription patterns associated with phenotypes. Third, we sequenced global miRNAs and individual-level phenotype-specific brain methylomes to determine the extent to which these putative regulators associate with phenotypic differentiation and genomic organization. These analyses highlight fundamental traits of the molecular basis of phenotypic differentiation and plasticity of similar phenotypes apparent in both species. As such, these data provide the first genome sequence, to our knowledge, for an aculeate wasp; provide a framework and hypotheses for revealing the molecular signatures of caste evolution; and, more generally, help define scenarios where conserved or contrasting molecular processes in phenotypic evolution might be used.

Results and Discussion

Typical Insect Genome Composition and Organization.

A single haploid male for each species was sequenced on the Illumina platform achieving 110-fold coverage. The de novo assembled P. canadensis and D. quadriceps genomes were 211 Mega-basepairs (Mbp) and 268 Mbp in size, respectively (SI Text, sections I, II.1, and II.2). These genome sequences are almost complete, with 97–99% of the conserved cluster of orthologous proteins mapped in the two genomes; 79–86% of proteins were annotated (Fig. S1 AD and SI Text, sections II.3II.5). The genome compositions were similar to the genome sequences of other social insects, with D. quadriceps sharing more of its predicted protein content with other ants (Formicidae), whereas P. canadensis shows more equitable levels of protein sharing with ants (Formicidae) and bees (Apidae) (Fig. 1C; Fig. S1E; SI Text, section II.6; and Dataset S1). This difference is likely to reflect the absence of any other aculeate wasp genome sequence in the public domain. Finally, the genome of P. canadensis contains more transposable elements (452,247, 12% of the genome) than D. quadriceps (217,417, 6% of the genome), most of which are simple or low-complexity repeats (Fig. 1D and SI Text, section II.7). Transposable elements were recently identified as potentially important in the evolution of social complexity in bees (6).

Fig. S1.

Fig. S1.

Functional annotation of P. canadensis and D. quadriceps genomes and reconstruction of the hymenopteran phylogeny. (AD) Functional annotation of P. canadensis and D. quadriceps genomes. (E) Reconstruction of the hymenopteran phylogeny across species for which genome sequences are publically available, using 600 shared proteins and including P. canadensis and D. quadriceps, places ants (Formicidae) basal to aculeate wasps and bees. The ancestor of the P. canadensis lineage is likely to have been a solitary species (black dot) (80), whereas the ancestor of D. quadriceps was a species that had developmental caste determination (red dot) (81).

Low Levels of Transcriptional Differentiation Between Phenotypes.

We obtained over 100 gigabase pairs (Gbps) of brain transcriptome sequence data from 23 individual adult female brains (four to seven biological replicates each of reproductives and nonreproductives per species), generating, on average, 3.6 Mbp (20.29 ± 0.67-fold coverage) and 4.9 Mbp (17.4 ± 1.36-fold coverage) per individual for the wasp and ant, respectively (SI Text, sections III.1 and III.2 and Dataset S2). In both species, we found fewer than 1% of genes were differentially expressed (DEGs), with little evidence of functional specialization between phenotypes (5). Using the union of DEGs from EdgeR [parametric approach (16)] and NOISeq [nonparametric approach (17)] (Fig. 2; Table 1; SI Text, section III.3; and Dataset S2), we found 67 (0.4%) DEGs in P. canadensis and 147 (0.8%) DEGs in D. quadriceps. In both species, the nonparametric approach identified significantly more up-regulated genes in reproductives relative to nonreproductives (χ2 = 31, P = 2.2e-08; Table 1). In P. canadensis, gene expression in nonreproductives was found to be more stochastic (noisy) than in reproductives despite similar variance of expression among the biological replicates (Fig. S2). Recent research suggests that evolution can shape noise in gene expression and that such noise can be adaptive and heritable (1820). If noise in transcription is an indicator of phenotypic plasticity (2123), our results would suggest that transcription in the nonreproductive phenotype is more responsive to changes in the biotic and social environment than transcription in the reproductive phenotype. Despite the small number of DEGs, significant functional enrichment of DEGs was detected in the ant reproductives, with 29 gene ontology terms significantly enriched for functions that included metabolic and ribosomal processes, regulation of expression, and an extracellular component [false discovery rate (FDR) < 0.5; SI Text, section III.4 and Dataset S2]. There was little sign of functional enrichment in the wasp (5) (although before FDR correction, oxidoreductase activity and lipid transport were overrepresented in reproductives). These data suggest there is little phenotypic specialization in the brain tissue of either species.

Fig. 2.

Fig. 2.

Low levels of transcriptional differentiation between phenotypes. (A and B) Counts per million plots of log fold mean gene expression differences between phenotypes, showing the numbers and log fold differences of DEGs up-regulated in reproductives (positive) and nonreproductives (negative). The union and individual results of two methods for detecting DEGs (NOISeq and EdgeR) are presented. FC, fold change.

Table 1.

Transcriptome analyses

Species (no. of genes analyzed) DEGs Caste NOISeq EdgeR Combined DEGs (% of total DEGs) TRGs (% of DEGs) Putative noncoding
P. canadensis (16,997) 67 Reproductive 56 36 64 (95) 5 (7.8) 2
Nonreproductive 0 3 3 (5) 0 0
D. quadriceps (16,503) 147 Reproductive 29 55 74 (50.4) 9 (12.2) 2
Nonreproductive 2 74 73 (49.6) 7 (9.6) 3

Two statistical methods were used to detect DEGs. EdgeR (a parametric approach) recognizes significant differences in gene expression when there is a large difference between the means of the groups. NOISeq-BIO (a nonparametric approach) tolerates much lower fold differences in gene expression between groups (as long as the ranges of the two conditions have little overlap) or if one of the groups shows evidence of high gene expression variation (“noisy” gene expression). The numbers of genes identified using Edge R and NOISeq correspond to genes that were significantly up-regulated in reproductives or nonreproductives, as indicated. The number (and percentage as function of total DEGs) of TGRs and their coding potential are also given (full data are available in Dataset S2). Combined DEGs correspond to the union of genes detected by both EdgeR and NOISeq methods (note that some genes were detected by both methods).

Fig. S2.

Fig. S2.

Differences between NOISeq and EdgeR in identification of DEGs and analysis of variance. (A) Box plots of the distribution of the variance from DEGs detected by NOISeq according to reproductive phenotype in each species. P values shown are for t tests. (B) Examples of highly variable gene expression in the nonreproductive phenotype of P. canadensis. Each individual is represented by a dot. (C) Box-whisker plots show examples of different patterns of gene calling using NOISeq or EdgeR in D. quadriceps. (i and ii) Ribosomal proteins are not found differentially expressed with either EdgeR or NOISeq. (iii and iv) Genes found only by EdgeR. (v and vi) Genes found only by NOISeq (ribosomal proteins). (D) Nonlinear RL was obtained based on an exponential decay model curve. For each gene, distances from the coefficient of variation (CV) and regression line (RL) were plotted and values below 0.05 of √ (reproductive^2 square + nonreproductive^2) were excluded to show only highly variable transcripts between reproductive phenotypes. Statistical tests (χ2) were performed comparing highly variable and nonvariable transcripts between each reproductive phenotype for each species. NON-REPRO, nonreproductive; REPRO, reproductive.

No Distinct Methylation Patterning Across the Genome or Between Phenotypes.

We sequenced the methylomes from three biological replicates each of individual adult brains from reproductive and nonreproductive phenotypes in P. canadensis and D. quadriceps using whole-genome bisulfite sequencing [BS-seq; 20 gigabase (GB) (>10-fold coverage) per brain] (SI Text, section IV.1 and Dataset S3).

We compared methylation patterns with the honey bee (24) to provide a reference point because the honey bee is the only close relative to our study species with comparable data on brain methylation available (SI Text, section IV.2). Global levels of methylation in the cytosine-guanine (CG) context were similar in both species, and similar to the honey bee (Table 2). P. canadensis exhibited greater methylation in the non-CG context but significantly fewer highly methylated regions than D. quadriceps (Table 2; Fig. S3 A and B; and SI Text, section IV.3). However, in comparison to the honey bee, both species showed relatively little gene body-specific methylation targeting (Fig. 3A; Table 2; Fig. S3C; and SI Text, section IV.4), together with a striking lack of consistently fully methylated cytosines (Fig. 3B). In both P. canadensis and D. quadriceps, DNA methylation is dispersed sparsely across genes (Fig. 3C), particularly in P. canadensis, whose genome lacks a DNA methyltransferase 3 (DNMT3) gene, an enzyme involved in de novo methylation (25, 26) (Fig. S4A and S5A and SI Text, section IV.5). In P. canadensis, we also found a prevalence of asymmetrical (one strand only) CG methylation, together with a variant of the DNA methylation transferase 1 (DNMT1) gene involved in the maintenance of DNA methylation, in its genome (Fig. S4 BD and SI Text, section IV.5). As observed in A. mellifera brains (27), both study species possess and express a ten-eleven translocation (TET) hydroxylase gene and base excision repair genes involved in demethylation, and have detectable hydroxymethylation in brain tissue (Figs. S4F and S5 and SI Text, section IV.6). Together, these general features provide an epigenetic landscape that may facilitate plasticity of genome function.

Table 2.

Methylation analyses

Species Context Total 1-kb regions analyzed Global methylation, % No. of HMRs (% of total 1 kb) Methylated genes HMR gene enrichment
P. canadensis CG 172,660 2.79 1,060 (0.6) 731 1.8-fold
Non-CG 192,001 3.62 1,057 (0.6) 314 1-fold
D. quadriceps CG 244,626 3.01 7,065 (2.9) 4,360 1.8-fold
Non-CG 245,493 1.26 12 (0.0) 6 0.7-fold
A. mellifera (24) CG 194,707 2.29 8,046 (4.1) 3,861 2.7-fold
Non-CG 212,728 0.11 0 (0.0) 0 0-fold

A minimum of 10% methylation level per 1-kb probe was used as a threshold to identify highly methylated regions (HMRs) in our two study species and in the honey bee A. mellifera as a comparison. Genes were defined as methylated if at least one HMR overlapped with a gene body (SI Text, section IV.2; full data are available in Dataset S3).

Fig. S3.

Fig. S3.

Context and distribution of DNA methylation. (A) Non-CG methylation validation. Both CHH and CHG contexts were analyzed. The levels of the non-CG methylation wrongly assigned due to genomic CpG SNPs represent 0.15% (±0.03) and 0.03% (±0.01) of the CHH context and 0.09% (±0.02) and 0.03% (±0.01) of the CHG context in P. canadensis and D. quadriceps, respectively. (B) Base-pair methylation distribution (coverage ≥ 5) follows the HMR pattern from Fig. 3B and confirms the absence of highly methylated cytosines in P. canadensis and D. quadriceps. (C) Average methylation distribution along gene bodies and their 200-kb adjacent sequences in different species of insects (blue, CG methylation; yellow, non-CG methylation; references are provided in Dataset S4). (D) Methylation levels across different genomic features and cytosine contexts were determined across six individuals for each species. (E) Smoothed scatter plots of methylation of the top and bottom DNA strands. Adjusted R2 values are as follows: 0.08745 for P. canadensis, 0.9037 for D. quadriceps, and 0.7816 for A. mellifera. Only Cyts covered by ≥20 reads are shown. OB, original bottom strands; OT, original top strands.

Fig. 3.

Fig. 3.

Absence of distinct DNA methylation patterning. (A) Average CG methylation level in brain tissue along gene bodies and 20 kb of adjacent sequence for P. canadensis (green), D. quadriceps (blue), and A. mellifera (yellow). (B) Proportion of methylated cytosines within highly methylated regions (HMRs). Hartigan’s dip test for unimodality: D = 0.0184 in P. canadensis, D = 0.0257 in D. quadriceps, and D = 0.0849 in A. mellifera (P < 0.0001 in all three species). (C) Screen shot from SeqMonk software showing the distribution of CG methylation in an orthologous gene in each of the three species. (D) Venn diagram of methylated orthologs: 74.5% (321 of 431) of the methylated genes in P. canadensis (green) overlap with D. quadriceps (blue). (E) Methylation distribution and summary box plots of the DEGs, ASGs, and non-ASGs, tested with Welch two-sample t tests. ns, not significant; RPKM, reads per kilobase per million. ***P < 0.0001. (F) Splicing entropy of annotated transcript isoforms. Shannon entropy grows with the number of annotated isoforms and with their equifrequency (entropy is 0 when only one isoform is expressed and high when all isoforms are expressed equally, Welch two-sample t test).

Fig. S4.

Fig. S4.

Synteny of the DNMT3 locus across eusocial insects and genetic comparison of DNMT1 and ten-eleven-translocation (TET) gene sequences. (A) Synteny analyses in the DNMT3 locus across eusocial insects [bee (A. mellifera) and ants (Atta cephalotes, Camponotus floridanus, Acromyrmex echinatior, Harpegnathos saltator, and Linepithema humile)]. Different colors of the bars in the plot represent noninterrupted genomic fragments, which are homologous between the compared species. (B) Schematic representation of domain conservation between Homo sapiens and P. canadensis of the DNMT1 enzyme. (C) Percentage of CXXC homology between P. canadensis and D. quadriceps TET, DNMT1, and an insect-specific CXXC-containing domain. (D) CXXC sequence alignment of DNMT1. Gray arrows highlight the exchange found only in P. canadensis. (E) Three major amino acid changes (R to K, “other” to F and “other” to I) were mapped in the mouse CXXC domain bound to unmethylated DNA. (F) Schematic representation of TET genes in P. canadensis (PCAN011a000021) and D. quadriceps (DQUA011a009260). The CXXC domain and 20GFeDO [Fe(II)-dependent oxygenase] domain sequences are aligned between the two insect species and with the Mus m. domesticus TET sequence.

Fig. S5.

Fig. S5.

Phylogeny, expression, and activity detection of the DNA methylation and demethylation machinery. (A) Phylogenomic tree of the key proteins involved in DNA methylation and demethylation across insects. (B) Brain gene expression levels of the DNA methylation and demethylation machinery based on RNA-seq datasets. (C) Ratio of 5mC/5-hydroxymethyl-cytosine (5hmC) in genomic DNA as measured by MS. Sample sizes are seven, five, and four individuals in P. canadensis, D. quadriceps, and M. m. domesticus, respectively.

Despite the general paucity of methylation patterning, we found significant conservation of methylated orthologs (Fig. 3D; SI Text, section IV.7; and Dataset S3) and a positive correlation between gene expression and CG methylated genes (Fig. S6 and SI Text, section IV.2), as seen before in other insect species (2833). Notably, however, DEGs tended to be hypomethylated in both species (Fig. 3E), and unlike the case in brain methylomes of adult honey bees (24, 34, 35), we found no evidence that phenotypes were associated with differentially methylated genes in our two species (t test, P > 0.05; SI Text, section IV.8). Analyses of alternative splicing revealed only 28 phenotype-specific isoforms expressed in D. quadriceps and none in P. canadensis (SI Text, section IV.9 and Dataset S3). This similarity between phenotypes is likely due to the global tendency of these species to express all isoforms simultaneously (Fig. 3F). Similar to DEGs, alternatively spliced genes (ASGs) were also hypomethylated compared with non-ASGs (Fig. 3E). This result may limit the role of DNA methylation in regulating phenotype-associated gene expression or alternative splicing in our species, and contrasts with what has been described in the honey bee (26, 3539).

Fig. S6.

Fig. S6.

CG islands description and association between methylation and expression and CG density. (A) CG density and distribution around the transcription start site (TSS). Data were analyzed using repeated measures one-way ANOVA, followed by Tukey post hoc tests. The difference between the TSS and ±2 kb are all significant for all four species (P < 0.0001). However, the baseline levels (±) and increase to the TSS vary widely between species. The increase expressed as a percentage for each species is shown in each graph. (B) Scatter plots of averages of gene body methylation vs. their transcriptional activity in log2 RPKM. Each data point is an individual gene. (C) Scatter plots showing percentage of CG within genes vs. their methylation level. Each point is an individual gene.

MicroRNAs Are Not Preferentially Targeting Differentially Expressed Genes.

Species-specific miRNA libraries were constructed from pools of individuals to include each phenotype to determine whether large numbers of miRNAs are shared between hymenopterans to the exclusion of the other insects and to identify potential cis-regulatory elements of DEGs. From our miRNA libraries, we identified 159 miRNA families (73 in P. canadensis and 86 in D. quadriceps), including 15 previously undescribed families (Fig. S7; SI Text, section V; and Dataset S4). We identified four families that are unique to hymenopterans and an additional nine families that were shared by apocritans to the exclusion of Nasonia and other insects. We found that miRNAs (40) were not preferentially targeting phenotype-specific DEGs, because although some DEGs appeared to be highly targeted, others were not (Dataset S4). Further work is needed to investigate miRNA expression levels in large numbers of individual queens and workers to rule out a role for miRNAs in caste differentiation.

Fig. S7.

Fig. S7.

The evolutionary history of arthropod microRNA families, based on ref. 78 and data in this study. Both gains and losses of miRNA families are variable between taxa, and although a higher rate of loss is observed within Hymenoptera than Diptera, this probably reflects the greater depth of genome and small RNA sequencing within Dipterans. (Box) MicroRNA targeting of both the differentially expressed genes (DEGs) and non-DEGs using the program TargetSpy. Candidate miRNA products (both 5′ and 3′) and 3′ UTR sequences for all non-DEGs (n = 2,223 for P. canadensis, and 3,121 for D. quadriceps) and DEGs (n = 33 for P. canadensis, and 35 for D. quadriceps) were used to identify the target sites. As 3′ UTR length was variable (D. quadriceps: non-DEGs = 319 nt, DEGs = 372 nt; P. canadensis: non-DEGs = 313 nt, DEGs = 429 nt), and the longer the 3′ UTR, the greater the likelihood of target sites being identified, the data were normalized per 100 nucleotides. These data showed no difference in the number of miRNA target sites between either the non-DEGs (2.8 hits per 100 nucleotides, SD = 1.5) or the DEGs (3.0 hits per 100 nucleotides, SD = 1.4) in P. canadensis or the non-DEGs (3.4 hits per 100 nucleotides, SD = 2.1) and the DEGs (3.8 hits per 100 nucleotides, SD = 2.1) in D. quadriceps.

Role for Conserved Toolkit Genes and Taxon-Restricted Genes in Regulatory Networks.

Despite the low numbers of DEGs, we found evidence that DEGs were nonrandomly organized at the network level in both species. Weighted gene correlation network analyses identify groups of genes that covary significantly in expression as “modules” (41). These analyses identified 31 and 41 gene coexpression networks for the ant and wasp, respectively (SI Text, section SIII.5 and Dataset S5). DEGs were clustered nonrandomly across networks in both species (Fig. 4A). Only three (10%) and two (5%) network modules showed significant overrepresentation of DEGs in the ant and wasp, respectively, and only one network module in the ant showed evidence of functional enrichment for ribosomal terms (SI Text, section III.5). Phenotype-specific transcription in both species is therefore governed by subtle but coordinated coexpression networks.

Fig. 4.

Fig. 4.

Coordinated transcriptional network organization. (A) DEGs are nonrandomly distributed across modules (groups of genes with similar levels of expression): 14 of 41 DEGs in P. canadensis modules [binomial generalized linear model (glm) × 2[13] = 162; P < 0.0001] and 25 of 31 DEGs in D. quadriceps modules (binomial glm × 2[24] = 288; P < 0.00001). Colors correspond to the different modules. An asterisk indicates the modules that correlate significantly with phenotype. (BF) Network graphs show the connectivity of annotated genes and TRGs in the modules that correlate significantly with phenotype. There were two modules in P. canadensis [yellow module, P = 2.4 × 10−23 (C); red module, P = 14.1 × 10−22 (E)] and three in D. quadriceps [light yellow module, P = 9 × 10−19 (B); magenta module, P = 2.7 × 10−42 (D); dark turquoise module, P = 8.6 × 10−4 (F)]. DEG fold enrichment in module: yellow (9-fold), red (3.6-fold), light yellow (21.5-fold), magenta (5.4-fold), and dark turquoise (7.7-fold). Nodes represent individual genes (with their XLOC gene name given). Edges indicate high coexpression between genes; edges with a correlation below specific thresholds are removed to aid visualization (41) [Thresholds: 0.27–1 (B), 0.31–1 (C), 0.15–1 (D), 0.24–1 (E), 0.12–1 (F)]. Connectivity (number of edges per node above the threshold) is indicated by node size. Annotated DEGs that are hubs [hubs defined as highly connected genes with more than five connections (c > 5)] are shown in red, and taxon-restricted DEGs that are hubs (c > 5) are shown in blue. Toolkit genes and TRG names are highlighted. Three genes that are DEGs in both species were found to be hubs in some networks: myrosinase (c = 16) in P. canadensis and vitellogenin (c = 14) and fibrillin (c = 8) in D. quadriceps.

There is a debate over the relative roles for core sets of conserved genes (4248) and taxon-restricted genes (TRGs) (5, 44, 47, 49, 50) in the evolution of convergent phenotypes (7, 44, 46). We found evidence that both types of gene classes play peripheral roles in the molecular networks associated with phenotypic differentiation in our study species. In each species, we identified both classes of genes among DEGs, determined whether their functions were conserved, and ascertained their putative importance in the gene networks associated with phenotypic differentiation. There were significant levels of overlap in the identity of DEGs between the two species (reciprocal BLAST hits of DEGs; n = 11 genes, P < 0.003 relative to chance for both species; SI Text, section III.3 and Dataset S2), suggesting they are homologs. Some of these genes were the same as those genes that had been previously identified as conserved “toolkit” genes for alternative phenotypes in eusocial insects [e.g., cytochrome P450, vitellogenin, hexamerin-2, kruppel homolog 1 (4248)], but others were not (e.g., fibrillin-like gene; glutaminase, esterase, and myrosinase enzymes; a gene coding for a lysozyme). Gene identity may be conserved, but not the direction of expression (5, 7): Four of 11 genes were worker-biased in the ant, whereas all 11 were queen-biased in the wasp (Dataset S2). Finally, conserved DEGs were not generally highly connected in the coexpression networks of either species (Fig. 4 CF). This observation contrasts with eusocial insect species with phenotypes that are determined irreversibly during development, where conserved genes can play central roles in gene networks (44).

TRGs (those genes having no significant homologs in available genomic databases) were detected in DEG sets in both species (ant: 10%, n = 16; wasp: 7.5%, n = 5) (Fig. 4 CF, Table 1, and Fig. S8) and at similar levels to TRGs across the whole genome [ant (11.6% TRGs): χ2 = 0.11, P = 0.74; wasp (9.1% TRGs): χ2 = 0.52, P = 0.47; Dataset S5]. Taxon-restricted DEGs are likely to be new genes (short in length relative to annotated/known genes) (49) (Fig. S8) with unknown/novel functions (“guilt-by-association” network analysis) (41), because their nearest neighbors were also taxon-restricted (unknown function) (mean = 2.3 of the 10 most connected genes had BLAST hits; Fig. S8 and Dataset S5). Finally, taxon-restricted DEGs had similar low levels of connectivity to conserved genes in the networks of both species (generalized linear model) (ant: binomial errors, P = 0.89; wasp: quasibinomial errors, P = 0.96; Fig. 4 CF), suggesting that conserved genes and previously unidentified TRGs are similarly important in phenotypic differentiation in these two species.

Fig. S8.

Fig. S8.

Comparison of characteristics of taxon-restricted (TRGs) and known (annotated) genes, with respect to their length, expression levels, and distribution among coexpression networks. (A) TRGs (orphans) are similarly expressed compared with annotated genes but tend to be shorter in length in both P. canadensis and D. quadriceps. FPKM values of DEGs (Top) and gene lengths of DEGs (Bottom) are shown [P. canadensis (n = 5) and D. quadriceps (n = 16)]. (B) Length of putative proteins of TRGs compared with all and annotated putative proteins in the genome. AA, amino acid. (C, Left) Distribution of known (shaded) and putative taxon-restricted (hatched) genes in reproductives (red) and nonreproductives (blue) of D. quadriceps and P. canadensis. The number of genes is shown within each area, and the proportion of phenotype-biased genes that each represents within that species is denoted in parentheses. (C, Right) Distribution of TRGs within the modules of weighted coexpression networks. The numbers of DEGs for each phenotype (reproductives in red, nonreproductives in blue) and whether they are taxon-restricted (hatched) or known (shaded) are shown.

These data support the emerging hypothesis that conserved genes, new genes, and/or new regulatory networks are important in the evolution of phenotypic diversity (5, 44, 4751). Our analyses add to this hypothesis by identifying roles for both conserved genes and TRGs in highly plastic phenotypes.

Summary and Conclusions

We sequenced the genomes, miRNAs, multiple brain transcriptomes, and methylomes from two eusocial insect species whose life cycles depend on high phenotypic plasticity throughout life. This data includes the first aculeate wasp genome sequence to our knowledge. Both species displayed three key molecular signatures that may be molecular hallmarks for highly plastic phenotypes in simple eusocial insects. These key molecular signatures are as follows: (i) little molecular differentiation between phenotypes in transcription but subtle nonrandom differentiation at the transcriptional network level; (ii) no evidence of a role for DNA methylation or miRNAs in regulating phenotypic differentiation and an overall lack of distinct methylome patterning, together with evidence of methylation turnover; and (iii) a similar role for both conserved toolkit genes and previously unidentified taxonomically restricted genes in phenotypic differentiation. These characteristics may allow plasticity in the regulation of the genome, and thus facilitate plasticity at the phenotypic level (52). The sequencing of more species with different levels of plasticity and multiple phenotypes will be required to confirm this hypothesis (6). However, the available data suggest that these hallmarks contrast with those hallmarks of eusocial insects with low plasticity like the honey bee and most ants, where a large proportion of genes, functionality, and network differentiation are associated with phenotypic differentiation (44, 5358), and where phenotypes appear to be regulated by DNA methylation (24, 25, 30, 34, 35, 37, 5962). Comparisons of species with contrasting evolutionary histories, as in our study species, will be especially valuable in revealing the molecular signatures at the origin of social evolution (e.g., in P. canadensis) and in reversions from complex to simple behaviors (e.g., in D. quadriceps). Methylome data from the brains of other ant (or wasp) species are not currently available. However, whole-body analyses of two species of ants revealed less defined methylome patterning and fewer differentially methylated genes between reproductive and nonreproductive phenotypes in Harpegnathos (high phenotypic plasticity) compared with Camponotus (lower phenotypic plasticity) (30), in support of our hypothesis. These insights, and the generation of the deep, multifaceted genomic resources for two model organisms with simple societies, help to plug a fundamental gap in our understanding of the molecular basis of phenotypic plasticity and serve to generate novel and important hypotheses on eusocial evolution. A particular focus for future work would be on whether the intriguing lack of coherent DNA methylation patterning and a key member of the enzymatic machinery (DNMT3) as regulators of alternative phenotypes is of general importance in permitting genomes to be highly responsive, as we have seen at the phenotypic level in social species with high phenotypic plasticity.

Methods

Detailed methodology and supporting information on sample collection (SI Text, section I), genome sequencing (SI Text, section II), RNA-sequencing (SI Text, section III), BS-seq (SI Text, section IV), and miRNA sequencing (SI Text, section V) are provided. A dataset (Datasets S1–S5) for every section is also provided.

SI Text

References of the pipelines and software mentioned in this study are available on request.

I. Sample Collections

I.1. P. canadensis.

P. canadensis wasps were collected from 10 wild colonies in Panama in June 2012 at Punta Galeta, Colon (9°21′30.877′′ N, 79°54′0.0053′′ W), under the field collection permit SE/A-20-12 from the Autoridad Nacional del Ambiente. All colonies contained a single queen (henceforth referred to as “reproductives”) that laid all of the eggs, was socially dominant, and rarely left the nest, and many workers (henceforth referred to as “nonreproductives”) that provisioned their sibling brood on their natal colony (63). Colonies were medium-sized (20–40 wasps) postemergence colonies (meaning nonreproductives are the daughters of the reproductives), and had been monitored for several weeks, during which time there had been no change of reproductives. Reproductive phenotypes were identified from observations of reproductives laying eggs, being socially dominant, and rarely leaving the colony, and nonreproductive phenotypes were identified as subordinate individuals that were observed bringing forage prey to the colony and feeding brood. Foragers tend to be the older nonreproductive individuals in their nests (1). Reproductive and nonreproductive wasps were collected directly off their nests with forceps at midday during the active foraging periods; their heads were cut off and immediately placed into RNAlater solution (Ambion), and their bodies were stored in 80% EtOH and kept at −20 °C until dissection of brains and ovaries. Ovary dissections confirmed that reproductives had mature eggs in their ovaries and had mated, whereas nonreproductives had no ovary development but some had mated, as expected in this species (63) (Dataset S2).

I.2. D. quadriceps.

Seven wild colonies of D. quadriceps were collected from the Atlantic Forest in Sergipe (11°01′23′′ S, 37°12′9′′ W) and Campo Formoso (10°2′972′′ S, 40°20′771′′ W) in Brazil from 2009 to 2011 under collection permits 10BR004553/DF and 11BR006471/DF from the Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais. Because this species nests deep in the ground, colonies had to be excavated and kept in the laboratory for several weeks to identify phenotypes. Each ant was fitted with a radio-frequency identification tag that allowed foragers (those ants that left the nest regularly to find forage) to be identified. The dominance rank of each individual was determined using multiple 30-min behavioral observation periods, during which time the type of interaction and the identities of the actor and recipient were recorded. Dominance hierarchies were then constructed from aggressive interactions for each colony (64); high-rank phenotypes were identified by frequent, high-intensity aggressive interactions, and low-rank phenotypes were identified by only rare involvement in aggressive interactions. Low-ranked individuals tend to be the older nonreproductive individuals in the nest (12). Ants were killed in liquid nitrogen; their heads were then transferred into RNAlater solution, and their bodies were transferred to 80% (vol/vol) ethanol for storage at −80 °C until dissection of brains and ovaries. Dissections recorded whether an individual had mature eggs in its ovaries and whether it had mated (Dataset S2). Gamergates (henceforth referred to as reproductives) were identified as the top-ranked dominant individual in their colony, rarely left the nest chamber (i.e., did not forage), had mature eggs in their ovaries, and had mated. Nonreproductives that foraged were identified as some of the lowest ranked (least dominant) females; they frequently left the nest (i.e., foraged), had not mated, and did not have mature eggs in their ovaries.

II. Genome Sequencing and Annotation

II.1. DNA Extraction.

DNA was extracted from a single haploid male of P. canadensis and D. quadriceps using a BioSprint 96DNA Blood Kit from QIAGEN. This kit uses the Mag Attract (QUIAGEN) magnetic particle technology for DNA purification. DNA was eluted in buffer AE (QUIAGEN). Microsatellite analysis was conducted to confirm male haploidy.

II.2. Genome Sequencing and Assembly.

DNA from whole single haploid males was sheared in a Covaris S2 instrument, and Illumina genomic libraries with a peak insert size of 513 bp (P. canadensis) and 542 bp (D. quadriceps) were sequenced on the Illumina HiSeq sequencing system using a 2 × 100-nt paired-end sequencing protocol. Mate-pair libraries with span size of 3.9 kb (P. canadensis) and 5 kb (D. quadriceps) were prepared from the same material. Mate-pair reads were obtained from 2 × 50-nt paired-end sequencing on the Illumina HiSeq platform. Data filtering and genome assembly of Illumina reads were as previously described (65). The soap de novo assembler (version 1.05) with a k-mer size of 49 was used. D. quadriceps mate-pair data contained a high percentage of read-pairs not derived from fragments encompassing the circularization junction. Such nonjunction pairs align in the same orientation and distance as regular Illumina paired-end reads. Nonjunction pairs were removed before scaffolding. To this end, 166.6 million mate-pairs were aligned against a D. quadriceps preassembly that had been calculated based on paired-end data. Read-pairs were kept that mapped uniquely with each read (81.2 million pairs). The mapping distance of read-pairs scattered around 5,000 bp, which is expected for the true mate-pairs, and around 355 bp. The mapping distance of 355 bp was the library insert size, and read-pairs with this distance represent the nonjunction pairs. Thereafter, all pairs with a mapping distance less than 1,000 bases were removed (35.0 million pairs remaining). To reduce potential errors in the scaffolding step, we further discarded duplicated read-pairs based on mapping positions. For the final D. quadriceps assembly, 3.0 million mate-pairs were used. For P. canadensis, an assembly before gap closing was deployed for downstream analysis. For D. quadriceps, we used a gap-closed assembly. Assembly metrics were calculated on all scaffolds and nonscaffold contigs sized 500 bp or larger.

II.3. Estimation of the Gene Space Completeness of the D. quadriceps and P. canadensis Genome Assemblies Using the Core Eukaryotic Genes Mapping Approach Pipeline.

We estimated the completeness of the assembly by looking at a set of “core” orthologous proteins [or “cluster of orthologous groups” (COG)] that were deemed to be highly conserved and exist in low-copy numbers in the large majority of higher eukaryotes. The latest COG set of genes includes 248 sets of “COG” proteins from six divergent eukaryotic species. Even though sequence coverage and the “N50” parameter (a measure of genome assembly quality; 536 kb and 1,359 kb in P. canadensis and D. quadriceps, respectively) are useful for characterizing a genome with regard to “base pairs,” they do not always describe the status of the gene space or whether one will be able to detect and identify genes easily and accurately in the genomic sequence. The Core Eukaryotic Genes Mapping Approach (CEGMA) pipeline has been used to determine the state of the gene space (i.e., another indicator of genome completeness) of several species in the past few years. Most (nearly) complete assemblies [using whole genome sequencing (WGS) technology] include 90% of the 248 COG proteins.

In the case of the D. quadriceps assembly, the CEGMA pipeline was able to map 99.19% (246 of 248) of the conserved COGs at least partially and 97.58% (242 of 248) over their entire length. In the P. canadensis assembly, the CEGMA pipeline mapped 98.79% (245 of 248) of the COGs at full length and 99.6% (247of 248) at least partially. The high percentage of completely detected COGs is a strong indicator that the assemblies, in terms of their “gene space,” can be considered as being of very high quality. Therefore, they were judged to be robust enough for all subsequent steps, leading to obtaining a protein-coding reference gene annotation for both species.

II.4. Obtaining a Protein-Coding Reference Gene Annotation for the P. canadensis and D. quadriceps Genomes.

Protein-coding gene annotation reference sets for the P. canadensis and D. quadriceps genomes were generated by combining the Program to Assemble Spliced Alignments (PASA r2012-06-25) and Evidence Modeler (EVM r2012-06-25) to obtain coding DNA sequences (CDS) models using three main sources of evidence: (i) aligned transcripts, (ii) aligned proteins, and (iii) gene predictions.

II.4.1. Generation of transcriptome assemblies for P. canadensis and D. quadriceps.

P. canadensis transcripts for assembly with PASA (r2012-06-25) were obtained as follows: Newbler (version 2.6) was used to generate a “transcriptome” from a set of pooled RNA-sequencing (RNA-seq) reads derived from different tissues of P. canadensis. Newbler built 45,271 isotigs/contigs (i.e., “transcripts”) corresponding to 26,326 isogroups. D. quadriceps transcripts were obtained by first mapping brain-derived RNA-seq reads to the genome assembly using the RNA-seq read-mapper topHat, followed by the assembly of potential transcripts derived from these alignments using the Cufflinks program. This process resulted in 27,787 transcripts (15,784 genes). The transcriptomes obtained for both species were subsequently added to the PASA (r2012-06-25) database [which uses Genomic Mapping and Alignment Program (GMAP) as the alignment engine]. PASA was set to be quite stringent: Newbler or Cufflinks transcripts with less than 90% identity to the genomic sequence over 90% of their length were discarded so that PASA generated 29,568 and 26,944 transcript assemblies for P. canadensis and D. quadriceps, respectively.

II.4.2. Generation of protein alignments to the P. canadensis and D. quadriceps genomes.

To generate spliced protein alignments for P. canadensis and D. quadriceps, UniProt-90 proteins (for P. canadensis), invertebrate-specific UniProt/Swiss-Prot proteins (for D. quadriceps), Apis mellifera Ensembl protein sequences (for D. quadriceps), and Harpegnathos saltator (a ponerine ant species closely related to D. quadriceps) reference annotation proteins (for D. quadriceps) were split-mapped to the genomes in two steps: first with a BLAST-like alignment tool (BLAT) to find approximate loci, and then with GeneWise to obtain more refined alignments. In addition, we used the spliced-protein alignment tool exonerate to map the A. mellifera Ensembl protein sequences to the scaffolds of both P. canadensis and D. quadriceps. In the case of D. quadriceps, exonerate was also used to map UniProt/Swiss-Prot invertebrates’ annotated proteins (Dataset S1).

II.4.3. Generation of ab initio geneid P. canadensis- and D. quadriceps-specific gene predictions.

Geneid is an ab initio gene prediction program used to find potential protein-coding genes in anonymous genomic sequences. In the context of geneid, training basically consists of computing position weight matrices (PWMs) or Markov models of order 1 for splice sites and start codons, and deriving a model of coding DNA (generally a Markov model of order 5). Furthermore, once a preliminary species-specific matrix is obtained, it is further optimized by adjusting two internal matrix parameters: the cutoff of the scores of the predicted exons (eWF) and the ratio of signal to coding statistics information to be used (oWF).

The initial training sets for both species comprised 1,500 gene models selected from the set of longest ORFs (more than 100 amino acids) corresponding to complete genes produced by the PASA pipeline. Of these gene models, 80% (1,200) were used to train geneid, whereas the remaining 20% (300) were set aside to test the accuracy of the newly developed matrix. The 1,500 D. quadriceps training set protein-coding gene models included 14,356 canonical donor splice sites, 14,524 canonical acceptor sites, and 1,198 start codons. The 1,500 P. canadensis protein-coding gene models contained 5,350 canonical donor splice sites, 5,386 canonical acceptor sites, and 1,197 start codons. These start codons were used to compute PWMs, whereas the donor and acceptors were used to derive Markov matrices of order 1. Given the large number of sequences for both species, we also had enough coding/noncoding nucleotides to derive a Markov model of order 5 for the coding potential. Accuracy of the geneid D. quadriceps and P. canadensis parameter files was tested on evaluation “artificial scaffolds” built for each species, consisting of the 300 evaluation set concatenated gene models with 800 nt of intervening sequence between each of the genes (Dataset S1).

The training of geneid to obtain parameter files for these two species was based on the method described to obtain a Drosophila melanogaster geneid parameter file. Training was performed in a “semiautomated” fashion by using a recently developed geneid training tool (geneidTRAINer1.1).

Generation of homology-derived SGP2 P. canadensis- and D. quadriceps-specific gene predictions.

SGP2 is a syntenic gene prediction tool that combines ab initio gene prediction (geneid) with tblastx (nucleotide six-frame translation-nucleotide six-frame translation) searches between two or more genome sequences to provide both sensitive and specific gene predictions. It tends to improve geneid’s performance, especially by reducing the number of false-positive predictions. SGP2 requires a reference genome to which the target genome (in this case, D. quadriceps or P. canadensis) is compared. We decided to use the genome of A. mellifera (honey bee) as a reference to develop the SGP2 parameter files for both of these species. A. mellifera was selected not only because of the good quality of its assembly but also due to its being at an appropriate evolutionary distance from both genomes being annotated. This evolutionary distance results in the coding regions of the genes having higher conservation than the intergenic and intronic regions of the target/reference genomes being aligned. This feature contributes to the higher accuracy of SGP2 compared with geneid.

The starting points for obtaining parameter files for SGP2 are the previously described geneid matrices. The geneid-derived SGP2 parameter files were optimized on artificial scaffolds comprising the same 1,200 sequences used to train geneid with concatenation, so that 800 “intergenic” nucleotides flanked each of the gene models. SGP2 matrices were optimized by modifying not only the eWF internal parameter (as done for geneid), but also two SGP2-specific internal parameters (“NO_SCORE” and “HSP_factor”). The NO_SCORE parameter provides a penalty for no overlap between tblastx-derived high-scoring pairs (HSPs) and geneid ab initio predictions in the same region, whereas the HSP_factor parameter reduces the score assigned to the HSPs to maximize the prediction accuracy. The optimal values of NO_SCORE and HSP_factor were found to be 0.04 and −0.125, respectively, for D. quadriceps and 0.07 and −0.175, respectively, for P. canadensis. The newly developed SGP2 parameter files were evaluated on the same artificial scaffolds used to evaluate the geneid matrix (Dataset S1).

The strategy used to obtain the SGP2-specific parameter files was based on the methodology described by Parra et al. (66) used to obtain a human SGP2 parameter file utilizing mouse homology evidence.

Generation of ab initio GlimmerHMM P. canadensis- and D. quadriceps-specific gene predictions.

We obtained P. canadensis- and D. quadriceps-specific matrices for the gene prediction tool GlimmerHMM. GlimmerHMM is an ab initio program that is based on a generalized hidden Markov model. This program also incorporates splice site models adapted from the GeneSplicer program and a decision tree adapted from GlimmerM. We generated the species-specific matrices for this program by using a “self-training” script obtained from the University of Maryland (www.cbcb.umd.edu/software/GlimmerHMM). The training was performed by following the instructions provided (available at www.cbcb.umd.edu/software/GlimmerHMM/man.shtml#training). GlimmerHMM was trained on the same set of sequences used to train the gene prediction programs geneid and SGP2. The new GlimmerHMM matrices were also evaluated on the same artificial scaffolds used to evaluate the other ab initio gene prediction parameter files (Dataset S1).

Generation of ab initio GeneMark D. quadriceps-specific gene predictions.

We generated a D. quadriceps-specific matrix for the gene prediction program GeneMark. The GeneMark-Eukaryotic Self-training (ES) algorithm was developed for finding protein-coding genes in eukaryotic genomes without training sets. GeneMark-ES determines species-specific gene-finding parameters using a self-training algorithm based on the species of interest genomic sequence. We generated a D. quadriceps-specific matrix for this program by using the self-training instructions found at the program developer’s website (opal.biology.gatech.edu/). GeneMark was trained on the same set of sequences used to train the gene prediction programs described above. The new geneMark matrix was also evaluated on the same artificial scaffold used to evaluate other ab initio gene prediction parameter files used in this study (Dataset S1).

Generation of ab initio and evidence-derived Augustus P. canadensis- and D. quadriceps-specific gene predictions.

We built P. canadensis- and D. quadriceps-specific matrices for the gene prediction program Augustus. Augustus is a program that predicts genes in eukaryotic genomic sequences and that is also “retrainable.” The program is based on a hidden Markov model and integrates a number of known methods and submodels. To obtain parameter files for Augustus, we used its own training program (www.molecularevolution.org/molevolfiles/exercises/augustus/training.html), and utilized the program to estimate the optimal parameters for P. canadensis and D. quadriceps, given the same 1,200 species-specific genes used in training the other ab initio tools utilized to obtain gene predictions on these species. The resulting Augustus parameter files were evaluated on the same artificial scaffolds consisting of the 300 concatenated gene models with 800 nt of intervening sequence used to evaluate the other programs utilized to predict genes in P. canadensis and D. quadriceps (Dataset S1).

In addition, we obtained a set of predictions that used the P. canadensis and D. quadriceps Augustus parameter files in combination with PASA-derived transcript evidence obtained for either species. Briefly, for P. canadensis, this transcript evidence set consisted of 29,568 PASA transcripts built from an initial set of 45,271 contig/isotig “transcripts” generated by Newbler using 454-P. canadensis sequence data (generation of PASA transcripts from Newbler data is described in SI Text, section 11.4.1) (5). With regard to D. quadriceps, Augustus was fed with a set of 26,944 PASA assemblies derived from 27,787 transcripts obtained using the Cufflinks program.

Our strategy for taking advantage of the large set of transcripts obtained for both species allowed us to obtain a higher accuracy evidence-based set of Augustus (+hints) predictions on both assemblies. In order for the P. canadensis and the D. quadriceps Augustus matrices to take advantage of the external data, we first had to optimize some internal parameter of the Augustus parameter files for each of the species being trained; the exonpart bonus for hints corresponding to PASA evidence (“E”) was given a bonus of 1 × E3 for P. canadensis and 1 × E2 for D. quadriceps. Also, for every exonpart that was not supported by the PASA evidence, the probability of the gene structure was given a malus or penalization of 0.995 for either species. Furthermore, complete exons predicted by Augustus that perfectly matched the exons in the external hints were given a bonus of 1 × E4 for both species. The intron bonus for (PASA) hints of source E was set to 1 × E4 for P. canadensis and 1 × E5 for D. quadriceps, meaning that a predicted intron would get this bonus if it is exactly as in the PASA “hint.”

The geneid, SGP2, Augustus (with or without hints), and GlimmerHMM D. quadriceps and P. canadensis parameter files, and the D. quadriceps GeneMark matrix were subsequently used to predict genes on the repeat-masked assemblies of either genome. The P. canadensis assembly is made up of 3,852 scaffolds, whereas the D. quadriceps assembly contains 13,571 scaffolds and 719 contigs. Given the species-specific parameter files developed for each of the two species being studied, geneid predicted 20,148 protein-coding genes in P. canadensis and 25,970 sequences in D. quadriceps. SGP2 generated 17,906 predictions on the assembly of P. canadensis and 26,104 gene models in the D. quadriceps genome. The ab initio tool GlimmerHMM produced 28,353 gene models on the scaffolds of P. canadensis and 25,205 in D. quadriceps. The program Augustus predicted 17,009 genes on the assembly of P. canadensis and 12,693 gene models for the genome of D. quadriceps, whereas the evidence-based variation of Augustus (+hints) produced 15,671 predictions for the P. canadensis genome and 17,385 protein-coding genes in the scaffolds of D. quadriceps. The program GeneMark generated 33,262 predictions in the assembly of D. quadriceps. These programs were then used as input to a “combiner” (EVM r2012-06-25), which was the program used to obtain the reference annotation for the genomes.

II.4.4. EVM-based genome annotation of the P. canadensis and D. quadriceps assemblies by combining different sources of evidence using weights.

The resulting transcript (PASA) and protein alignments obtained for both species were then filtered as suggested in the EVM documentation (evidencemodeler.github.io/).

Gene predictions were obtained as described above and also added to the EVM pipeline. Subsequently, the transcript alignments, protein alignments, and ab initio gene models were combined into consensus CDS models by EVM using different weights. These weights (Dataset S1) were selected following the EVM documentation (i.e., evidencemodeler.github.io/) and with regard to the ab initio predictions, also based on the accuracy of the different programs in predicting sequences in the evaluation artificial scaffolds for each of the two species.

The consensus CDS models were then updated with UTRs and alternative exons through three rounds of PASA’s routine to update annotations. The resulting transcripts were grouped into genes, and preselected species-specific identifiers were then assigned to the genes, transcripts, and protein products derived from them. Finally, and as a quality control, the protein products obtained from the reference annotation of both species were aligned against either the exhaustive National Center for Biotechnology Information (NCBI) nonredundant (NR-2013) (D. quadriceps) or the highly curated UniProt-90 (P. canadensis) database using the “protein vs. protein” blastp “flavor” of the sequence comparison tool BLAST (E = 10−3 with a minimum identity of 30%) to determine what percentage of the annotated genes matched a sequence of these large biological sequence public databases. The resulting EVM-based annotation contains 15,799 genes (17,592 transcripts) for P. canadensis and 13,688 genes (18,230 transcripts) for D. quadriceps. Dataset S1 shows the wide range of statistics obtained from the assemblies and analysis of the protein-coding reference annotation sets obtained for both species.

II.5. Functional Annotation of the P. canadensis and D. quadriceps EVM-Based Annotation.

Functional annotation was performed using InterPro, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Blast2GO, and Reactome databases; information about domains, functions, and gene ontology (GO) terms was inferred from the sequences in the repositories to the proteins of interest based on sequence similarity. InterProScan (version 5) was used to scan through all available InterPro databases. BLAST alignment searches against the NCBI NR collection of protein sequences (release 2013-04) were used as input to the local Blast2GO software p2gpipe (version 2.5.0) with the databases go_201303 (March 2013). KEGG orthology (KO) groups were assigned by the KEGG Automatic Annotation Server using the bidirectional best hit method against a representative gene set from 29 different species, including D. melanogaster, A. mellifera, and Nasonia vitripennis. KO identifiers were then used for retrieval, utilizing the KEGG REST-based API service for the KEGG relevant functional annotation, such as metabolic pathways and external database references. Sets of one-two-one orthologs from PhylomeDB were used via the Reactome URL/XML query system to scan the ReactomeDB (version 46, September 2013). Additionally, SignalP was used to predict the presence and location of signal peptide cleavage sites in our proteins of interest.

A total of 86.8% of proteins predicted for D. quadriceps and 79.7% of proteins found in P. canadensis were found to have some type of annotation feature (Dataset S1). Descriptions (name) were assigned to 47.9% (D. quadriceps) and 43.4% (P. canadensis) of the proteins using Blast2GO or the KEGG. GO terms, derived from four different sources, were assigned to 62.4% (D. quadriceps) and 54.5% (P. canadensis) of the annotated proteins. GO terms were grouped by the different functional categories (Fig. S1) using CateGOrizer, utilizing GOSlim without top-level categories (molecular function, biological process, and cellular component).

Unannotated proteins (13.2% of the reference annotation for D. quadriceps and 20.3% for P. canadensis) tend to be shorter compared with the annotated ones (Fig. S8). These smaller proteins could be derived from incomplete ORFs erroneously predicted by the automatic annotation pipeline due to genome fragmentation.

II.6. Comparison of Genomes.

The P. canadensis and D. quadriceps genomes differed in the composition of genes and proteins, as expected from phylogenetically distinct species. We constructed a phylome resource for each species. From this phylome resource, we identified orthologous genes and proteins with genomic databases (P. canadensis: www.phylomedb.org/phylome_134, D. quadriceps: www.phylomedb.org/phylome_135). D. quadriceps showed 70% similarity with proteins in the other ants (Formicidae) and 46% of all proteins shared with the only other sequenced ponerine ant (H. saltator) (Fig. 1C). Conversely, P. canadensis shares similar levels of protein similarity with the bees (Apidae, sharing 34%) and ants (Formicidae, sharing 22%). Less than 25% of proteins were the same in the two species. However, the top 10 GO terms and the GOSlim terms were similar between the two species (Fig. S1).

II.7. Phylogenomic Relationships in the Aculeate Hymenoptera.

We reconstructed the phylogeny of the aculeate Hymenoptera using only the available data on published genome sequences for social insects, including P. canadensis and D. quadriceps. The discussion on whether more taxa or more genes are better is not settled, and there are arguments for both. Because recent multigene analyses of Hymenoptera have challenged the phylogenetic relationships based on a few genes (67), we chose to maximize the opportunity of analyzing phylogeny across hundreds of genes. A maximum likelihood tree was produced based on the combined analysis of a concatenated alignment of 897 single-copy genes present in all species. This tree placed ants as the basal group to wasps and bees (Fig. S1E). This result suggests that the wasp and ant do not belong to the same historically named family Vespoidea (5, 67), and that, instead, wasps share more ancestry with bees than with ants. In support of this conclusion, P. canadensis shared more similarity with Apoidea (bee) proteins than Formicidae (ant) proteins. The tree was reconstructed using randomized accelerated maximum likelihood utilizing the PROTein, GAMMA distributed rates, LG (Lee and Gascuel) model (PROTGAMMALG). Bootstraps lower than 100 are indicated in the nodes of the tree. We reran the analysis with a smaller dataset of 93 genes previously obtained for P. canadensis (5) and found the wasp to be basal to the ants and bees, as also suggested by a recent transcriptome-based analysis of the hymenopteran phylogeny (67). This analysis indicates that topology changes depending on the number of genes used, suggesting that the choice of genes strongly influences resulting topology.

II.8. Transposon Annotation.

Repeats were detected using RepeatMasker (version 4.0.1) with repeat library 20120418. Searches were limited to repeats deriving from Apocrita species. Hits were grouped according to the annotated repeat class, and the grouped classes were analyzed separately (Fig. 1D).

III. Phenotype-Specific Transcriptome Sequencing and Analyses for P. canadensis and D. quadriceps

III.1. Transcriptome Sequencing.

Total RNA was extracted from single brains using the QIAGEN All Prep DNA/RNA Mini Kit according to the manufacturer’s instructions. Where possible, samples of reproductives and nonreproductives were matched within nests. For RNA-seq libraries, 50 to 200 ng of total RNA was enriched for mRNA using Dynabeads Oligo(dT)25 from Invitrogen in two subsequent steps of purification with fresh beads. Fragmentation was done by incubation of mRNAs for 5 min at 94 °C in the First-Strand Buffer (Invitrogen), and directly followed by cDNA synthesis using a SuperScript III Kit (Invitrogen) according to the manufacturer’s instructions. dUTPs were incorporated for second-strand synthesis for library orientation. cDNA was end-repaired, A-tailed, and ligated with a methylated Adaptor Oligo Kit (Illumina) using the NEB Next Kit (New England Biolabs) according to the manufacturer’s instructions. dUTP excision was done before amplification using USER mix (New England Biolabs). Libraries were amplified with 16 cycles using 2× Phusion HF buffer (New England Biolabs). Size selection and cleaning between steps were performed with the AMPure XP system (Agencourt) to select DNA fragments between 250 bp and 500 bp. Paired-end libraries were sequenced on either an Illumina GAIIx or Illumina HiSeq system.

III.2. EVM and de Novo Assembly.

The EVM/PASA approach (SI Text, section II.5) provides a comprehensive protein-based annotation from multiple sources of data but can miss transcripts due to tissue-specific transcription of some genes, low-coverage genes, and noncoding elements. To obtain the most comprehensive transcriptome annotation, we performed additional de novo assemblies with two different and widely used methods (Cufflinks and Trinity) and merged these assemblies with the EVM/PASA annotation. Using Cufflinks, we started with the independent assembly for each sample, without providing any annotation and with the default parameters. In the next step, we merged all of the individual annotations plus the EVM annotation (described in SI Text, section II.5) by using the Cuffmerge program. We then proceeded by removing duplicate transcripts from this merged annotation. We used the program Gffread from the Cufflinks package (−M and −d) to collapse matching transcripts and obtain a final set of transcripts without duplicates.

Next, we conducted a de novo transcript assembly using Trinity. This procedure recovered additional transcripts that were not found in the previous analyses. Additional transcripts were added to the final assembly, where no existing sequence was previously described. The process involved the use of “in-house” scripts to automate raw read filtering with a “fastq_quality_filter” (Hannon laboratory FASTX-Toolkit, hannonlab.cshl.edu/fastx_toolkit/) to remove 5′ and/or 3′ regions with Phred scores less than 30 and ribosomal RNA removal using the SortMeRna (version 1.99 beta) pipeline, using Insecta 28S and 18S databases downloaded from SILVA (www.arb-silva.de/). Quality checking was performed by FastQC. Next, Trinity was used with default settings, except for −SS_lib_type FR [forward/reverse (mate-pair data)]. This step was performed on a high-memory node of the Advanced Computing Research Centre (University of Bristol, www.bris.ac.uk/acrc/).

For P. canadensis, this procedure resulted in 15,595 genes with 53,181 transcripts. For D. quadriceps, 12,765 genes and 57,730 transcripts were found.

RSEM (RNA-Seq by Expectation-Maximization) was used to calculate transcript abundance quantifications for each individual by mapping individual RNA-seq reads (the same reads that were filtered for Trinity) against the combined EVM, Cufflinks, and Trinity merged assembly. This process created gene and isoform raw counts and fragments per kilobase per million (FPKM) reads.

III.3 Identifying Differentially Expressed Genes (DEGs).

III.3.1. DEG identification.

Raw read counts and FPKM values from the combined assembly (SI Text, section III.2) were formatted for use in EdgeR and NOISeq-BIO, respectively. Performance of the software for differential gene expression analysis declines with biological variation in the data. Thus, we decided to use two different programs: EdgeR, which uses negative binominal distributions and general linear models (glms), and NOISeq-BIO, which uses a nonparametric approach. The list of genes obtained as the “union” of DEGs resulting from these two programs is an exhaustive list. For both programs, genes expressed below a median of 1 for all individuals were discarded, and a 5% FDR was used: FDR < 0.05 for EdgeR analyses and q > 0.95 (equivalent to FDR < 0.05) for NOISeq analyses. Some genes satisfy both algorithmic criteria, but some are specific to NOISeq or EdgeR alone. In this way, we found ∼30–40% more DEGs, but both programs suggest that low transcriptomic differences exist between castes of both species. The 14 D. quadriceps ribosomal proteins that were found differentially expressed in NOISeq alone were a specific example of program-specific DEGs (Fig. S2C). They have a low mean fold difference between conditions and so will not be picked up by EdgeR, yet they have conserved expression around the two means in the two conditions, without overlapping ranges of expression. This result shows the importance of the statistics behind different expression programs; using just one could potentially hide diversity that exists. We do acknowledge that using two FDRs in union removes the significance of the respective statistics; however, union of results provides a comprehensive list of differences, with an FDR of 10%, <0.1 maximum for the combined set.

Homologous relationships from the phylome (SI Text, section II.7) were used to compare DEG lists between the two species; we found 11 genes in common. The significance of the overlap was assessed by a 1,000-fold permutation test with the sampling size of a random set of 67 and 147 proteins, respectively, from P. canadensis and D. quadriceps. The random overlap was 1.2 genes, which is significantly different from our observed overlap (χ2 = 8.68, df = 1, P = 0.003).

III.3.2. Characterization of taxon-restricted DEGs.

All differentially expressed transcripts underwent sequence searching by BLAST [blastn: e-value = <1e-10, gap-open = 1, gap-extend = 1; then tblastx: e-value = <1e-4 (default settings, using default BLAST nucleotide database “nt” and NR protein database “nr” respectively)], and the largest ORF for each transcript was then defined (blastp: e-value = <1e-4, nr database), with changing taxonomically restricted filters. First, they underwent sequence searching by BLAST against the whole database; they were then limited to Vespiodea (which includes the ants and wasps) and then Vespidae (wasps) or Formicidae (ants); and they were finally limited to the genus for either Polistes or Dinoponera. Unidentified genes were those genes that were restricted to genus level only. We cannot rule out that with greater taxonomic sampling, some previously unidentified genes may be found in other taxonomic lineages. Hence, we refer to these genes as taxon-restricted, rather than novel. DEGs were manually inspected to remove homology assignment to small repetitive regions, transposons, or conserved domains that artificially gave BLAST support (most of these sequences had closest homology to mouse and human sequences only). To compare the representation of differentially expressed TRGs relative to the genome as a whole, we sampled 2,500 random proteins from both ants and wasps, using only genes that had expression in the same range as in differentially expressed analysis (median > 1). Using their largest ORFs, we performed blastp to 1e−4 (or minimum 30% homology to whole NCBI). Using this method, 9.6% of DEGs in the wasp and 11.6% in the ant were taxon-restricted.

III.4. GO Enrichment Analyses of DEGs.

For the GO analysis, we used the Blast2GO Fisher exact test (FDR < 0.5) and selected enrichments with at least five genes in the reference set. We used the assembled list of GO terms associated with each gene and tested for enrichment in all DEGs (SI Text, section III.3.1.1) and in specific coexpression modules (SI Text, section III.5) against a background of all GO terms for each annotated gene.

III.5. Network Analyses of Coexpressed Modules of Genes.

Genes alone give a very limited understanding of transcriptome dynamics. Grouping genes that covary in expression into modules can be more informative than analyzing individual genes. To obtain systems-view of gene expression, we performed coexpression network analysis by using Weighted Gene Coexpression Network Analysis (WGCNA) (41), which allowed us to group genes together that have similar expression patterns (i.e., covary) into “modules.” Genes with correlated expression (in a module) are assumed to have some common role in a particular pathway/functionality. To identify modules of importance in caste differentiation, we then correlated the connectivity of the genes within the module with a gene significance measure (log10 q-value) for differential expression between castes. Genes that are differentially expressed between castes and are highly connected within a module (i.e., positive correlation) are predicted to be drivers of expression in that module, and may reflect an important molecular pathway in caste differentiation. We used this prediction to identify modules of putative importance in caste differentiation in each species and then compared them between species to look for evidence of conserved networks underlying caste differentiation. We tested for nonrandom distribution of DEGs across modules by determining whether the proportion of DEGs is drawn from the same binomial distribution or whether DEGs are clustered within modules. To do this test, we fitted to general linear models, one with a single parameter (the global proportion) and the other with a separate parameter for each of the modules (the saturated model). We compared the best fit of the two models for each species separately using ANOVA, and then also compared the two species.

From the combined assembly, only genes with expression above 1 FPKM across all samples were selected for network analyses (11,427 in P. canadensis and 11,758 in D. quadriceps). As a gene significance measure, we used the −log10 q-value obtained from the q-value of the NOISeq differential expression test between castes for each species. Default settings were used, except “BETA = 8” in P. canadensis and “BETA = 12” in D. quadriceps. Thirty-one modules were recovered in D. quadriceps [mean number of genes per module is 12 (range: 12–3,715)], and 41 modules were recovered in P. canadensis [mean number of genes is 68 (range: 16–2,268) (Dataset S5). By using the log10 P value (for connectivity measurement in the module, Dataset S5) as a significance score and the log2 fold change to indicate caste-biased differential expression, we identified three (light yellow, magenta, and dark turquoise) and two (yellow, red) modules positively correlated in the ant and wasp, respectively. We then looked into the functionality of the modules, by checking for GO enrichment in the gene terms (as described SI Text, section III.4). The magenta module in the ant was the only one to be significantly functionally enriched, with multiple ribosomal terms. We were also able to use this analysis to pick out key hub (highly connected) genes for phenotypic differentiation.

III.6. Variance Analysis.

FPKM values (SI Text, section III.3) were log-transformed, and statistics were performed using the MatrixStat Package in R. The coefficient of variation (CV), variation, and SD values were calculated for each transcript (Fig. S2 A, C, and D). For Fig. S2D, nonlinear regression lines (RLs) for reproductive and nonreproductive states were calculated based on an exponential decay model curve. CV distances from RLs were plotted, and values below 0.05 of √ (reproductive^2 + nonreproductive^2) were excluded. Transcripts with a distance greater than 0.05 for a specific reproductive state and less than −0.05 for the other reproductive state were marked as specific and highly variable transcripts. Statistical tests (χ2) between variable and nonvariable groups were performed between each reproductive state.

IV. Phenotype-Specific Methylome Sequencing and Analyses for P. canadensis and D. quadriceps

IV.1. DNA Extraction and BS-Seq Library Preparation for Methylomes.

Genomic DNA was extracted from single brains using the QIAGEN All Prep DNA/RNA Mini Kit according to the manufacturer’s instructions. For BS-seq libraries, synthetic genomes of P. canadensis and D. quadriceps were prepared to serve as an unmethylated control genome in the analysis of global 5-methyl-cytosine (5mC). Synthetic genomes for each species were generated from an individual brain extraction using the REPLI-g Ultra-fast Mini Kit from QIAGEN. One microliter of these synthetic genomes was used to generate BS-seq libraries as outlined below. Between 200 ng and 500 ng of input genomic DNA was used per library. One library per species was additionally spiked with unmethylated lambda DNA at 1:1,000 (wt/wt) to provide an estimation of BS conversion efficiency (Bismark reports in Dataset S4). After sonication (Covaris), DNA was end-repaired, A-tailed, and ligated with a methylation Adaptor Oligo Kit (Illumina) using the NEB Next kit according to the manufacturer’s instructions. The adaptor-ligated DNA was treated with sodium-BS using an Imprint DNA Modification Kit from Sigma–Aldrich according to the manufacturer’s instructions for the two-step protocol. BS-treated DNA was amplified using KAPA HiFi Uracil + DNA Polymerase (KAPA Biosystems) with 15 cycles. Size selection and cleaning between steps were performed with an AMPure XP system (Agencourt) to select DNA fragments between 250 bp and 500 bp.

IV.2. BS-Seq Analysis and Quantification.

IV.2.1. Read mapping.

BS-seq raw sequence reads were trimmed to remove both poor-quality calls and adapters using TrimGalore (version 0.3.5 with default parameters, (www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and the standard Illumina adapter contamination. Remaining sequences were mapped to the genome assembly (EVM/PASA; SI Text, section II) using Bismark (version 0.12.2, with the parameters: –bowtie2 –score_min L,0,−0.4) (Dataset S3). As a measure of sequencing depth, genome coverage was calculated per the number of cytosines observed/total number of cytosines in the genome.

IV.2.2. Quantification of methylation.

All methylation analyses were done using SeqMonk (www.bioinformatics.babraham.ac.uk/projects/seqmonk/). Downstream analyses were performed using custom Perl, Python, and R scripts. Statistical tests were performed using R or GraphPad Prism 6.

Quantification and honey bee methylome comparison.

Given the variety of BS-seq quantification methods used in insects so far (24, 26, 29, 34, 37), it was crucial to apply an appropriate and comparable computational method for our data and the BS-seq brain data on the honeybee by Lyko et al. (24). The global methylation levels and distribution analysis for P. canadensis and D. quadriceps and from the raw data of A. mellifera (24) were quantitated using 1-kb tiled windows (Fig. 3 A and B, Table 2, and Dataset S3), as previously reported (68, 69). The 1-kb regions are of similar length to the average length of methylated regions in other insects (29). The methylation percentage for each region corresponds to the average percentage of methylation across the observed cytosines of a given context within the window. Regions containing fewer than 20 methylation calls were excluded from the analysis. For P. canadensis and D. quadriceps, false-positive results were also removed if a region appeared consistently methylated in each individual and in the synthetic genome of that individual’s respective species. Initially, we detected significantly methylated regions (SMRs) if a region was significantly higher methylated compared with the synthetic genome (Fisher test on the ratio of combined methylation calls, P < 0.05). Because no synthetic genome was available to validate the methylated regions for A. mellifera, a minimum of 10% methylation level was used as a threshold to identify highly methylated regions (HMRs) in the three species. For P. canadensis and D. quadriceps, SMRs and HMRs strongly overlap by 87% and 98% in P. canadensis and D. quadriceps, respectively. Therefore, we choose to display the values of HMRs in Table 2. Methylated genes were selected if at least one HMR overlapped with a gene body. Base-pair quantification of the largest genomic scaffolds in each species (3.1 Mb and 5.8 Mb covering 232 and 294 genes in P. canadensis and D. quadriceps, respectively) were used to confirm the lack of highly methylated Cyts (Fig. S3B). These scaffolds were chosen instead of the full genome because the P. canadensis genome contains some scaffolds with unmethylated repeat elements that interfere with a genome-wide base-pair analysis. Independent of the origin of the tissues, we also performed interspecies methylome comparisons (Fig. S3C) using the raw data of several insect species downloaded from the NCBI Sequence Read archive (Dataset S3). With the exception of D. melanogaster, all insect species appear to have intragenic DNA methylation enrichment.

Quantification of genomic features and repeats.

Methylation across genomic features (gene, intron, exon, 3′ UTR, 5′ UTR, and intergenic regions) (Fig. S3D) was quantified by calculating the percentage of methylation for each individual feature that contained at least 10 measured cytosine. The overall methylation level for each feature was taken to be the mean of the methylation values for the individually measured cytosines. For the whole genome, methylation value windows of 10 kb were tiled over the genome and then treated in the same way as other features.

Methylation quantification at the gene level.

To associate methylation level with gene expression (Fig. 3E and Fig. S6B), methylation levels per gene were calculated using an average of methylation values in 200-bp windows positioned over a normalized gene length using the SeqMonk BS-seq pipeline with default parameters.

IV.3. Non-CG Methylation Validation.

To eliminate the possibility that non-CG methylation was false positively called due to SNPs in the genome assembly, we analyzed the downstream base of the reads from which the non-CG calls originated (Fig. S3A). The first 1 million reads of each library were used to count the methylated and nonmethylated cytosines that align to WT or variable sequences to estimate the levels of the non-CG methylation wrongly assigned due to genomic CpG SNPs. Both contexts of non-CGs (CHG and CHH where H signifies adenine, thymine, or cytosine) were analyzed.

IV.4. CpG Island Analysis.

CG methylation drops around the transcription start site (TSS) more drastically in A. mellifera compared with D. quadriceps and P. canadensis; both of our study species seem to maintain a permissive transcriptional environment at the TSS (i.e., with no methylation; Fig. 3A), a feature that often indicates genomic CpG islands (CGIs) in vertebrates (70). To detect CGIs in our species, we first applied a widely used algorithm developed for the human genome (71). We found only 256 CGIs in P. canadensis, whereas more than 26,000 were detected in D. quadriceps, comparable to the number of CGIs found in other ant species (6). Conventional algorithms for the detection of CGIs have been reported to underpredict CGIs in nonhuman species (72). Indeed, by plotting the CG density around the TSS in P. canadensis, we found significant enrichment of CGs around the TSS compared with the basal intergenic level, although P. canadensis has shorter stretches of CG enrichment compared with D. quadriceps, as expected from the CGI prediction numbers (Fig. S6A). Our findings show that such genomic features are conserved well beyond vertebrate genomes (72) and indicate a lack of maintenance of CGI boundaries and density specifically in insects.

IV.5. Phylogenetic Analyses of Methylation Machinery.

The NCBI NR database was searched for homologs of Dnmt1, Dnmt2, Dnmt3a/3b, Uhrf1, Tdg, Smug, Ung, and Tet enzymes using the BLAST algorithm. The collection was extended by acquiring the sequences of additional proteins identified in the Transcriptome Shotgun Assembly Database (tsa_nr) and high-throughput genomic sequences databases. Next, the sequence collection was sorted and verified for the completeness of protein sequences, proper assignment to the protein groups, and presence of characteristic protein domains and catalytic motifs. When possible, the protein sequences were also retrieved via BLAST searches from annotated protein databases [official gene sets (OGSs) are listed in Dataset S3] available for hymenopteran genomes. The phylogenetic tree was based on the NCBI taxonomy and generated with the interactive tree of life (Fig. S5A).

IV.5.1. Analysis of Dnmt3 synteny.

Genomic loci containing the Dnmt3 genes from various hymenopteran genomes were identified within the OGSs or via tblastx searches in the corresponding genomes. Next, the whole genomic scaffold containing the gene or the Dnmt3 locus flanked by 100-kb sequences on both sides was isolated. The genomic sequences were validated to contain the gene coding for Dnmt3 orthologous in the expected place. Genomic sequence alignment was performed using Mauve software utilizing default settings. Next, the resulting genomic alignment was centered at the Dnmt3 locus (Fig. S4A).

IV.5.2. CXXC Dnmt1 alignment and structural analysis.

The CXXC domains were extracted from aligned protein sequences of Dnmt1 proteins based on the domain prediction with NCBI Conserved Domains Database. The validated and extracted sequences of Dnmt1 CXXC domains were realigned using ClustalX and visualized using BioEdit (www.mbio.ncsu.edu/bioedit/bioedit.html) software (Fig. S4D). Next, the DNA contacting loops of the CXXC domain of murine Dnmt1 (73) were identified in the crystal structure (Protein Data Bank ID code 3PT6). The amino acid exchanges within the CXXC domain of P. canadensis that are in close proximity to DNA or could influence the overall arrangement of DNA contacting loops in the CXXC domain were identified based on the comparison with human, murine, and hymenopteran CXXC sequences.

IV.6. Analysis of Global 5mC and 5-Hydroxymethyl-Cytosine Levels by Liquid Chromatography/MS.

Five hundred nanograms of genomic DNA was incubated with 5 units of DNA Degradase Plus (Zymo Research) at 37 °C for 3 h. The resulting mixture of 2′-deoxynucleosides was analyzed on a Triple Quad 6500 mass spectrometer (AB Sciex) fitted with an Infinity 1290 LC system (Agilent) and an Acquity UPLC HSS T3 column (Waters), using a gradient of water and acetonitrile with 0.1% formic acid. External calibration was performed using synthetic standards, and for accurate quantification, all samples and standards were spiked with isotopically labeled nucleosides as described by Bachman et al. (74) (Fig. S5C).

IV.7. Functional Enrichment Analyses of Methylated Genes (GO) and Ortholog Identification.

Functional enrichment of methylated genes (Dataset S4) was assessed by generating a NR list of genes from an initial transcript list and analyzing this list with Blast2GO, using a background of all genes. Groups with more than 10 genes with a significance level of P < 0.05 (Fisher test) were taken to be significant and used to generate functional enrichment tables. To find methylated orthologs between species, two BLAST databases were created using protein sequences. Each protein then underwent sequence searching by BLAST against the database from the other species. When a pair of proteins both matched the other’s database with ≥40% identity and ≥30 amino acids, they were classed as orthologs.

IV.8. Identification of Differentially Methylated Genes (DMGs).

To identify DMGs associated with phenotypes in P. canadensis and D. quadriceps, we used the 1-kb window probe quantification method (SI Text, section IV.2). For each probe, the methylation levels of the three reproductive replicates were compared with the three nonreproductive replicates using a Student’s t test (P < 0.05) and applying Benjamini and Hochberg multiple testing correction. For both species, the outcome of this test failed to detect any candidate genes (overlapping with at least one significant probe) having a phenotype-associated methylation signature.

IV.9. Alternative Splicing (AS) Analysis.

For the alternative splicing analysis, we used only protein-coding genes from the EVM-based annotation.

Reads from all RNA-seq samples were aligned to the reference assembly and EVM-based annotation using gemtools RNAseq pipeline (version 1.6.2), which uses Graphical Environment Manager (GEM; algorithms.cnag.cat/wiki/The_GEM_library) as a mapper. GEM is “split-aware” and finds gapped matches; that is, it can correctly align reads originating from exon junctions into the genome. Additional filtering steps were performed to allow fewer than two mismatches and five multimappings. On average, 74 ± 4% (D. quadriceps) and 82 ± 2% (P. canadensis) of the reads were mapped across samples, with 72 ± 4% (D. quadriceps) and 81 ± 2% (P. canadensis) of the reads mapping uniquely. Flux Capacitor (version 1.2.4, sammeth.net/confluence/display/FLUX/Home) was used to quantify genes, transcripts, exons, and splice junctions in each sample separately. Expression levels were given in pure read counts and in reads per kilobase per million (RPKM).

IV.9.1. Characteristics of AS isoforms.

To get an overview of alternative splicing activity, we computed the number of expressed isoforms per gene and the splicing entropy across all samples (75).

The splicing entropy H (Fig. 3F) of a given gene g with n annotated isoforms with relative frequencies p1,...,pn in a given organ is computed as

H(g)=i=1npilnpi.

H(g) grows with a number of annotated isoforms and with their equifrequency. H(g) is zero when there is only one isoform expressed and is at its maximum when all isoforms are equally expressed.

IV.9.2. Phenotype-associated isoforms.

For this analysis, only the genes with at least two annotated isoforms and an expression level ≥0.3 RPKM were taken: 2,505 (D. quadriceps) and 1,081 (P. canadensis) genes in total. For each species, pairwise comparison was performed between different phenotypes to identify the phenotype-specific isoform use of genes. The relative splicing ratio was calculated for every transcript, and its variability was then compared within each group and also between groups. When dispersion of the splicing ratios across groups is similar, scores tend to be close to zero, whereas scores increase with the difference in the ratios. Once all genes were tested, the P values were corrected for multiple testing using an FDR control. In total, 28 genes with an FDR < 0.5 were identified in D. quadriceps. No genes with a significant FDR level were identified in P. canadensis. Functional annotation of ASGs was performed as described in section III.4. and the results are shown in Dataset S3.

V. MicroRNA Sequencing and Analyses for P. canadensis and D. quadriceps

V.1. Library Construction and Analysis.

miRNAs are important regulators of phenotype differentiation in social insects (40, 76). miRNA libraries were constructed for pools of individuals of each phenotype for both species to determine whether large numbers of miRNAs are shared between hymenopterans to the exclusion of the other insects. Total RNA was extracted from multiple individuals, developmental stages, and castes for both taxa using a TRIzol (THERMO) extraction. The Illumina TruSeq Small RNA Sample Preparation Kit was used for the library construction, with 27.3 million (D. quadriceps) and 31.5 million (P. canadensis) reads sequenced. These reads were processed using Galaxy and analyzed using miRDeep2 in conjunction with a variety of customs scripts to reduce the number of misannotated miRNAs. Previously unidentified miRNA families were identified following established criteria, where miRNA loci were considered distinct families if there were mismatches in the seed region (nucleotides 2–8) (77, 78) so that only high-confidence miRNA families were annotated (i.e., the presence of both 5′ and 3′ products and clear homogeneity in the processing). In addition, BLAST searches of the genomes were conducted to identify any miRNAs that were not present in the small RNA libraries but have been annotated in other species.

This combined approach allowed the identification of 73 miRNA gene families, including four previously unidentified families in P. canadensis and 86 miRNA gene families, including 11 previously unidentified families in D. quadriceps (submitted to miRBase). The predictive nature of the miRNA complement (79) allows an assessment of the completeness of sequencing, which is high, with only 13 miRNA families missing (four in D. quadriceps and nine in P. canadensis), and equates to 8% missing data (genome screens for Acyrthosiphon indicate ∼24% missing data), although some of those data missing will be genuine secondary losses rather than incomplete sampling. These data allow us to reconstruct the evolutionary history of many eusocial insect-specific miRNAs; previous experimental data have been confined solely to Apis. Thus, we identify five miRNA families shared by all apocritans, and nine shared by aculeatans (Fig. S7).

V.2. miRNA Identification and Targeting.

miRNA can play a role in post-transcriptional gene regulation either by precisely binding to target sites in the 3′ UTR of the candidate mRNA, which is then cleaved, leading to degradation of the mRNA, which is common in plants, or by imprecise binding of the 5′ end of the miRNA onto the target site on the 3′ UTR of the mRNA, leading to down-regulation and repression of the mRNA, which is more common in animals. Thus, we investigated whether miRNAs are preferentially targeting DEGs between phenotypes in comparison to those genes that are nondifferentially expressed (Dataset S4).

TargetSpy was used to identify miRNA/target interactions, and it uses two input files: a list of miRNAs and a list of candidate 3′ UTRs. Custom scripts were used to look for the last CDS of a gene, and this process was then extended on the 3′ side until it reached 500 bp, until it reached the end of the genome sequence, or until another gene was hit. This 3′ UTR was then extracted for all 35 DEGs and 3,121 non-DEGs in D. quadriceps and for all 33 DEGs and 2,223 non-DEGs in P. canadensis. Against these 3′ UTR sequences, 195 miRNA products (known 5′ and 3′ sequences) and 141 miRNA products were investigated for D. quadriceps and P. canadensis, respectively. Because 3′ UTR length was variable (D. quadriceps: non-DEGs = 319 nt, DEGs = 372 nt; P. canadensis: non-DEGs = 313 nt, DEGs = 429 nt), and because the longer the 3′ UTR, the greater was the likelihood of target sites being identified, the data were normalized per 100 nucleotides.

Our results (Fig. S7) showed that there was no difference in the number of miRNA target sites between the non-DEGs (2.8 hits per 100 nucleotides, SD = 1.5) or the DEGs (3.0 hits per 100 nucleotides, SD = 1.4) in P. canadensis or the non-DEGs (3.4 hits per 100 nucleotides, SD = 2.1) and the DEGs (3.8 hits per 100 nucleotides, SD = 2.1) in D. quadriceps.

Supplementary Material

Supplementary File
pnas.1515937112.sd01.xlsx (41.1KB, xlsx)
Supplementary File
pnas.1515937112.sd02.xlsx (460.1KB, xlsx)
Supplementary File
pnas.1515937112.sd03.xlsx (22.8KB, xlsx)
Supplementary File
pnas.1515937112.sd04.xlsx (198.3KB, xlsx)
Supplementary File
pnas.1515937112.sd05.xlsx (20.3KB, xlsx)
Supplementary File
pnas.1515937112.sd01.xlsx (41.1KB, xlsx)
Supplementary File
pnas.1515937112.sd02.xlsx (460.1KB, xlsx)
Supplementary File
pnas.1515937112.sd03.xlsx (22.8KB, xlsx)
Supplementary File
pnas.1515937112.sd04.xlsx (198.3KB, xlsx)
Supplementary File
pnas.1515937112.sd05.xlsx (20.3KB, xlsx)

Acknowledgments

We thank J. O. Dantas, A. Andrade, N. Dantas, R. Zaurin, E. Bell, R. Southon, W. T. Wcislo, J. Morales, and staff at the Galeta field station at the Smithsonian Tropical Research Institute Panama and the Universidade Federal de Sergipe for help and logistical support in fieldwork; D. Datta at the Centre for Genomic Regulation (CRG), K. Tabbada at the Babraham Institute, and N. Smerdon at The Wellcome Trust Sanger Institute for assistance with sequencing; T. Alioto of the Centro Nacional de Análisis Genómico in Barcelona for bioinformatics assistance; N. J. B. Isaac for statistical advice; groups that provided us with unpublished data (Dataset S3); and members of the W.R. and S.S. laboratories for their useful comments during the preparation of the manuscript. This work was conducted under Collecting and Export Permits SE/A-20-12, 10BR004553/DF, and 11BR006471/DF. This work was funded by Natural Environment Research Council Grants NE/G000638/1, NBAF581, and NE/K011316/1 (to S.S.) and Grant NE/G012121/1 (to W.O.H.H. and S.S.); the Research Councils UK (S.S); the Cancer Research UK Grant C14303/A17197 (to S.B); the Leverhulme Trust (W.O.H.H.); German Federal Ministry of Education and Research Grant FKZ 0315962 B; CRG core funding (to H.H.); Spanish Ministry of Economy and Competitiveness (MINECO) Grant BIO2012-37161 (to T.G.); MINECO Grant BIO2011-26205 (to R.G.); Instituto de Salud Carlos III Grant PT13/0001/0021 (to R.G.); the Instituto Nacional de Bioinformatica and Agència de Gestió d'Ajuts Universitaris i de Recerca (R.G.); Wellcome Trust Grants 095645/Z/11/Z (to W.R.) and WT099232 (to S.B); Biotechnology and Biological Sciences Research Council Grant BB/K010867/1 (to W.R.); the Stuttgart Universität (T.P.J.); and Fundaçao de Amparo à Pesquisa do Estado de Sao Paulo Grant 2010/10027-5 (to F.S.N.).

Footnotes

Conflict of interest statement: S.B. is a founder and shareholder of Cambridge Epigenetix Limited, and W.R. is a consultant and shareholder of Cambridge Epigenetix Limited.

This article is a PNAS Direct Submission.

Data deposition: Genomic analyses were performed on the whole-genome assemblies of Polistes canadensis and Dinoponera quadriceps, deposited at the DNA Data Bank of Japan/European Molecular Biology Laboratory/GenBank under the accession nos. PRJNA253269 and PRJNA253275, respectively. Raw data from all bisulfite-sequencing and RNA-sequencing libraries were deposited in the Gene Expression Omnibus (GEO) database (accession no. GSE59525).

See Commentary on page 13755.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1515937112/-/DCSupplemental.

References

  • 1.West-Eberhard MJ. Developmental Plasticity and Evolution. Oxford Univ Press; New York: 2003. [Google Scholar]
  • 2.Pfennig DW, et al. Phenotypic plasticity’s impacts on diversification and speciation. Trends Ecol Evol. 2010;25(8):459–467. doi: 10.1016/j.tree.2010.05.006. [DOI] [PubMed] [Google Scholar]
  • 3.Pigliucci M, Murren CJ, Schlichting CD. Phenotypic plasticity and evolution by genetic assimilation. J Exp Biol. 2006;209(Pt 12):2362–2367. doi: 10.1242/jeb.02070. [DOI] [PubMed] [Google Scholar]
  • 4.Schlichting CD, Wund MA. Phenotypic plasticity and epigenetic marking: An assessment of evidence for genetic accommodation. Evolution. 2014;68(3):656–672. doi: 10.1111/evo.12348. [DOI] [PubMed] [Google Scholar]
  • 5.Ferreira PG, et al. Transcriptome analyses of primitively eusocial wasps reveal novel insights into the evolution of sociality and the origin of alternative phenotypes. Genome Biol. 2013;14(2):R20. doi: 10.1186/gb-2013-14-2-r20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kapheim KM, et al. Social evolution. Genomic signatures of evolutionary transitions from solitary to group living. Science. 2015;348(6239):1139–1143. doi: 10.1126/science.aaa4788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rehan SM, Toth AL. Climbing the social ladder: The molecular evolution of sociality. Trends Ecol Evol. 2015;30(7):426–433. doi: 10.1016/j.tree.2015.05.004. [DOI] [PubMed] [Google Scholar]
  • 8.Patalano S, Hore TA, Reik W, Sumner S. Shifting behaviour: Epigenetic reprogramming in eusocial insects. Curr Opin Cell Biol. 2012;24(3):367–373. doi: 10.1016/j.ceb.2012.02.005. [DOI] [PubMed] [Google Scholar]
  • 9.Oster G, Wilson EO. Caste and Ecology in the Social Insects. Princeton Univ Press; Princeton: 1978. [PubMed] [Google Scholar]
  • 10.Hughes WOH, Oldroyd BP, Beekman M, Ratnieks FLW. Ancestral monogamy shows kin selection is key to the evolution of eusociality. Science. 2008;320(5880):1213–1216. doi: 10.1126/science.1156108. [DOI] [PubMed] [Google Scholar]
  • 11.Sumner S, Kelstrup H, Fanelli D. Reproductive constraints, direct fitness and indirect fitness benefits explain helping behaviour in the primitively eusocial wasp, Polistes canadensis. Proc Biol Sci. 2010;277(1688):1721–1728. doi: 10.1098/rspb.2009.2289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Monnin T. Dominance hierarchy and reproductive conflicts among subordinates in a monogynous queenless ant. Behav Ecol. 1999;10(3):323–332. [Google Scholar]
  • 13.Yan H, et al. Eusocial insects as emerging models for behavioural epigenetics. Nat Rev Genet. 2014;15(10):677–688. doi: 10.1038/nrg3787. [DOI] [PubMed] [Google Scholar]
  • 14.Peeters C. Monogamy and polygyny in ponerine ants with or without queens. In: Keller L, editor. Queen Number and Sociality in Insects. Oxford Univ Press; Oxford: 1993. [Google Scholar]
  • 15.Jandt JM, Tibbetts EA, Toth AL. Polistes paper wasps: A model genus for the study of social dominance hierarchies. Insectes Soc. 2014;61(1):11–27. [Google Scholar]
  • 16.Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: A matter of depth. Genome Res. 2011;21(12):2213–2223. doi: 10.1101/gr.124321.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ansel J, et al. Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genet. 2008;4(4):e1000049. doi: 10.1371/journal.pgen.1000049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Newman JR, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441(7095):840–846. doi: 10.1038/nature04785. [DOI] [PubMed] [Google Scholar]
  • 20.Viney M, Reece SE. Adaptive noise. Proc Biol Sci. 2013;280(1767):20131104–20131104. doi: 10.1098/rspb.2013.1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lehner B. Conflict between noise and plasticity in yeast. PLoS Genet. 2010;6(11):e1001185. doi: 10.1371/journal.pgen.1001185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Losick R, Desplan C. Stochasticity and cell fate. Science. 2008;320(5872):65–68. doi: 10.1126/science.1147888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Raser JM, O’Shea EK. Noise in gene expression: Origins, consequences, and control. Science. 2005;309(5743):2010–2013. doi: 10.1126/science.1105891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lyko F, et al. The honey bee epigenomes: Differential methylation of brain DNA in queens and workers. PLoS Biol. 2010;8(11):e1000506. doi: 10.1371/journal.pbio.1000506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kucharski R, Maleszka J, Foret S, Maleszka R. Nutritional control of reproductive status in honeybees via DNA methylation. Science. 2008;319(5871):1827–1830. doi: 10.1126/science.1153069. [DOI] [PubMed] [Google Scholar]
  • 26.Li-Byarlay H, et al. RNA interference knockdown of DNA methyl-transferase 3 affects gene alternative splicing in the honey bee. Proc Natl Acad Sci USA. 2013;110(31):12750–12755. doi: 10.1073/pnas.1310735110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wojciechowski M, et al. Insights into DNA hydroxymethylation in the honeybee from in-depth analyses of TET dioxygenase. Open Biol. 2014;4(8):140110. doi: 10.1098/rsob.140110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hunt BG, Glastad KM, Yi SV, Goodisman MAD. Patterning and regulatory associations of DNA methylation are mirrored by histone modifications in insects. Genome Biol Evol. 2013;5(3):591–598. doi: 10.1093/gbe/evt030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang X, et al. Function and evolution of DNA methylation in Nasonia vitripennis. PLoS Genet. 2013;9(10):e1003872. doi: 10.1371/journal.pgen.1003872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bonasio R, et al. Genome-wide and caste-specific DNA methylomes of the ants Camponotus floridanus and Harpegnathos saltator. Curr Biol. 2012;22(19):1755–1764. doi: 10.1016/j.cub.2012.07.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sarda S, Zeng J, Hunt BG, Yi SV. The evolution of invertebrate gene body methylation. Mol Biol Evol. 2012;29(8):1907–1916. doi: 10.1093/molbev/mss062. [DOI] [PubMed] [Google Scholar]
  • 32.Hunt BG, Brisson JA, Yi SV, Goodisman MAD. Functional conservation of DNA methylation in the pea aphid and the honeybee. Genome Biol Evol. 2010;2:719–728. doi: 10.1093/gbe/evq057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science. 2010;328(5980):916–919. doi: 10.1126/science.1186366. [DOI] [PubMed] [Google Scholar]
  • 34.Herb BR, et al. Reversible switching between epigenetic states in honeybee behavioral subcastes. Nat Neurosci. 2012;15(10):1371–1373. doi: 10.1038/nn.3218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lockett GA, Kucharski R, Maleszka R. DNA methylation changes elicited by social stimuli in the brains of worker honey bees. Genes Brain Behav. 2012;11(2):235–242. doi: 10.1111/j.1601-183X.2011.00751.x. [DOI] [PubMed] [Google Scholar]
  • 36.Cingolani P, et al. Intronic non-CG DNA hydroxymethylation and alternative mRNA splicing in honey bees. BMC Genomics. 2013;14:666. doi: 10.1186/1471-2164-14-666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Foret S, et al. DNA methylation dynamics, metabolic fluxes, gene splicing, and alternative phenotypes in honey bees. Proc Natl Acad Sci USA. 2012;109(13):4968–4973. doi: 10.1073/pnas.1202392109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Flores K, et al. Genome-wide association between DNA methylation and alternative splicing in an invertebrate. BMC Genomics. 2012;13(1):480. doi: 10.1186/1471-2164-13-480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jarosch A, Stolle E, Crewe RM, Moritz RF. Alternative splicing of a single transcription factor drives selfish reproductive behavior in honeybee workers (Apis mellifera) Proc Natl Acad Sci USA. 2011;108(37):15282–15287. doi: 10.1073/pnas.1109343108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Greenberg JK, et al. Behavioral plasticity in honey bees is associated with differences in brain microRNA transcriptome. Genes Brain Behav. 2012;11(6):660–670. doi: 10.1111/j.1601-183X.2012.00782.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Arendt J, Reznick D. Convergence and parallelism reconsidered: What have we learned about the genetics of adaptation? Trends Ecol Evol. 2008;23(1):26–32. doi: 10.1016/j.tree.2007.09.011. [DOI] [PubMed] [Google Scholar]
  • 43.Toth AL, Robinson GE. Evo-devo and the evolution of social behavior. Trends Genet. 2007;23(7):334–341. doi: 10.1016/j.tig.2007.05.001. [DOI] [PubMed] [Google Scholar]
  • 44.Mikheyev AS, Linksvayer TA. Genes associated with ant social behavior show distinct transcriptional and evolutionary patterns. eLife. 2015;4:e04775. doi: 10.7554/eLife.04775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Parker J, et al. Genome-wide signatures of convergent evolution in echolocating mammals. Nature. 2013;502(7470):228–231. doi: 10.1038/nature12511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stern DL. The genetic causes of convergent evolution. Nat Rev Genet. 2013;14(11):751–764. doi: 10.1038/nrg3483. [DOI] [PubMed] [Google Scholar]
  • 47.Sumner S. The importance of genomic novelty in social evolution. Mol Ecol. 2014;23(1):26–28. doi: 10.1111/mec.12580. [DOI] [PubMed] [Google Scholar]
  • 48.Berens AJ, Hunt JH, Toth AL. Comparative transcriptomics of convergent evolution: Different genes but conserved pathways underlie caste phenotypes across lineages of eusocial insects. Mol Biol Evol. 2015;32(3):690–703. doi: 10.1093/molbev/msu330. [DOI] [PubMed] [Google Scholar]
  • 49.Chen S, Krinsky BH, Long M. New genes as drivers of phenotypic evolution. Nat Rev Genet. 2013;14(9):645–660. doi: 10.1038/nrg3521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Simola DF, et al. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality. Genome Res. 2013;23(8):1235–1247. doi: 10.1101/gr.155408.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol. 2013;5(2):439–455. doi: 10.1093/gbe/evt009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Roberts SB, Gavery MR. Is there a relationship between DNA methylation and phenotypic plasticity in invertebrates? Front Physiol. 2012;2:116. doi: 10.3389/fphys.2011.00116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Chandrasekaran S, et al. Behavior-specific changes in transcriptional modules lead to distinct and predictable neurogenomic states. Proc Natl Acad Sci USA. 2011;108(44):18020–18025. doi: 10.1073/pnas.1114093108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ometto L, Shoemaker D, Ross KG, Keller L. Evolution of gene expression in fire ants: The effects of developmental stage, caste, and species. Mol Biol Evol. 2011;28(4):1381–1392. doi: 10.1093/molbev/msq322. [DOI] [PubMed] [Google Scholar]
  • 55.Grozinger CM, Fan Y, Hoover SER, Winston ML. Genome-wide analysis reveals differences in brain gene expression patterns associated with caste and reproductive status in honey bees (Apis mellifera) Mol Ecol. 2007;16(22):4837–4848. doi: 10.1111/j.1365-294X.2007.03545.x. [DOI] [PubMed] [Google Scholar]
  • 56.Barchuk AR, et al. Molecular determinants of caste differentiation in the highly eusocial honeybee Apis mellifera. BMC Dev Biol. 2007;7:70. doi: 10.1186/1471-213X-7-70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ament SA, et al. New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior. Proc Natl Acad Sci USA. 2012;109(26):E1801–E1810. doi: 10.1073/pnas.1205283109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Smith CR, et al. How do genomes create novel phenotypes? Insights from the loss of the worker caste in ant social parasites. Mol Biol Evol. 2015 doi: 10.1093/molbev/msv165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Amarasinghe HE, Clayton CI, Mallon EB. Methylation and worker reproduction in the bumble-bee (Bombus terrestris) Proc Biol Sci. 2014;281(1780):20132502. doi: 10.1098/rspb.2013.2502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Smith CR, et al. Patterns of DNA methylation in development, division of labor and hybridization in an ant with genetic caste determination. PLoS One. 2012;7(8):e42433. doi: 10.1371/journal.pone.0042433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Shi YY, et al. Diet and cell size both affect queen-worker differentiation through DNA methylation in honey bees (Apis mellifera, Apidae) PLoS One. 2011;6(4):e18808. doi: 10.1371/journal.pone.0018808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Elango N, Hunt BG, Goodisman MA, Yi SV. DNA methylation is widespread and associated with differential gene expression in castes of the honeybee, Apis mellifera. Proc Natl Acad Sci USA. 2009;106(27):11206–11211. doi: 10.1073/pnas.0900301106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Sumner S, Lucas E, Barker J, Isaac N. Radio-tagging technology reveals extreme nest-drifting behavior in a eusocial insect. Curr Biol. 2007;17(2):140–145. doi: 10.1016/j.cub.2006.11.064. [DOI] [PubMed] [Google Scholar]
  • 64.Asher C, Nascimento F, Sumner S, Hughes WOH. Division of Labour and risk taking in the dinosaur ant, Dinoponera quadriceps (Hymenoptera: Formicidae) Myrmecol News. 2013;18:121–129. [Google Scholar]
  • 65.Dohm JC, et al. The genome of the recently domesticated crop plant sugar beet (Beta vulgaris) Nature. 2014;505(7484):546–549. doi: 10.1038/nature12817. [DOI] [PubMed] [Google Scholar]
  • 66.Parra G, et al. Comparative gene prediction in human and mouse. Genome Res. 2003;13(1):108–117. doi: 10.1101/gr.871403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Johnson BR, et al. Phylogenomics resolves evolutionary relationships among ants, bees, and wasps. Curr Biol. 2013;23(20):2058–2062. doi: 10.1016/j.cub.2013.08.050. [DOI] [PubMed] [Google Scholar]
  • 68.Ziller MJ, Hansen KD, Meissner A, Aryee MJ. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat Methods. 2015;12(3):230–232. doi: 10.1038/nmeth.3152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Hansen KD, Langmead B, Irizarry RA. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 2012;13(10):R83. doi: 10.1186/gb-2012-13-10-r83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes Dev. 2011;25(10):1010–1022. doi: 10.1101/gad.2037511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Takai D, Jones PA. The CpG island searcher: A new WWW resource. In Silico Biol. 2003;3(3):235–240. [PubMed] [Google Scholar]
  • 72.Long HK, et al. Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. eLife. 2013;2:e00348. doi: 10.7554/eLife.00348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Song J, Rechkoblit O, Bestor TH, Patel DJ. Structure of DNMT1-DNA complex reveals a role for autoinhibition in maintenance DNA methylation. Science. 2011;331(6020):1036–1040. doi: 10.1126/science.1195380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Bachman M, et al. 5-Hydroxymethylcytosine is a predominantly stable DNA modification. Nat Chem. 2014;6(12):1049–1055. doi: 10.1038/nchem.2064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–108. doi: 10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Liu F, et al. Next-generation small RNA sequencing for microRNAs profiling in Apis mellifera: Comparison between nurses and foragers. Insect Mol Biol. 2012;21(3):297–303. doi: 10.1111/j.1365-2583.2012.01135.x. [DOI] [PubMed] [Google Scholar]
  • 77.Tarver JE, Donoghue PCJ, Peterson KJ. Do miRNAs have a deep evolutionary history? BioEssays. 2012;34(10):857–866. doi: 10.1002/bies.201200055. [DOI] [PubMed] [Google Scholar]
  • 78.Tarver JE, et al. miRNAs: Small genes with big potential in metazoan phylogenetics. Mol Biol Evol. 2013;30(11):2369–2382. doi: 10.1093/molbev/mst133. [DOI] [PubMed] [Google Scholar]
  • 79.Taylor RS, Tarver JE, Hiscock SJ, Donoghue PCJ. Evolutionary history of plant microRNAs. Trends Plant Sci. 2014;19(3):175–182. doi: 10.1016/j.tplants.2013.11.008. [DOI] [PubMed] [Google Scholar]
  • 80.Kurt M, Pickett JMC. Simultaneous analysis and the origin of eusociality in the Vespidae (Insecta: Hymenoptera) Arthropod Syst Phylogeny. 2010;68(1):3–33. [Google Scholar]
  • 81.Schmidt C. Molecular phylogenetics of ponerine ants (Hymenoptera: Formicidae: Ponerinae) Zootaxa. 2013;3647:201–250. doi: 10.11646/zootaxa.3647.2.1. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1515937112.sd01.xlsx (41.1KB, xlsx)
Supplementary File
pnas.1515937112.sd02.xlsx (460.1KB, xlsx)
Supplementary File
pnas.1515937112.sd03.xlsx (22.8KB, xlsx)
Supplementary File
pnas.1515937112.sd04.xlsx (198.3KB, xlsx)
Supplementary File
pnas.1515937112.sd05.xlsx (20.3KB, xlsx)
Supplementary File
pnas.1515937112.sd01.xlsx (41.1KB, xlsx)
Supplementary File
pnas.1515937112.sd02.xlsx (460.1KB, xlsx)
Supplementary File
pnas.1515937112.sd03.xlsx (22.8KB, xlsx)
Supplementary File
pnas.1515937112.sd04.xlsx (198.3KB, xlsx)
Supplementary File
pnas.1515937112.sd05.xlsx (20.3KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES