Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 3;539(7627):59-64.
doi: 10.1038/nature20124. Epub 2016 Oct 24.

Defining synonymous codon compression schemes by genome recoding

Affiliations

Defining synonymous codon compression schemes by genome recoding

Kaihang Wang et al. Nature. .

Abstract

Synthetic recoding of genomes, to remove targeted sense codons, may facilitate the encoded cellular synthesis of unnatural polymers by orthogonal translation systems. However, our limited understanding of allowed synonymous codon substitutions, and the absence of methods that enable the stepwise replacement of the Escherichia coli genome with long synthetic DNA and provide feedback on allowed and disallowed design features in synthetic genomes, have restricted progress towards this goal. Here we endow E. coli with a system for efficient, programmable replacement of genomic DNA with long (>100-kb) synthetic DNA, through the in vivo excision of double-stranded DNA from an episomal replicon by CRISPR/Cas9, coupled to lambda-red-mediated recombination and simultaneous positive and negative selection. We iterate the approach, providing a basis for stepwise whole-genome replacement. We attempt systematic recoding in an essential operon using eight synonymous recoding schemes. Each scheme systematically replaces target codons with defined synonyms and is compatible with codon reassignment. Our results define allowed and disallowed synonymous recoding schemes, and enable the identification and repair of recoding at idiosyncratic positions in the genome.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interest.

Figures

Extended Data Figure 1
Extended Data Figure 1
Simultaneous double selection and recombination enhances integration at a target locus. a. Classical recombination and double selection recombination. In classical recombination, a linear double stranded DNA with a synthetic DNA (s. DNA) sequence and a positive selection marker (+, CmR) flanked by homologous region 1 (HR1) and homologous region 2 (HR2) is transformed into the cell. Recombinants are selected by expression of the positive selection marker. By simultaneous double selection recombination, s. DNA containing double selection marker -2/+2 (sacB-CmR) is integrated in place of the double selection marker -1/+1 (rpsL-KanR) on the genome. Double selection for the gain of +2 and loss of -1 selects for simultaneous gain of s. DNA and loss of genomic sequence, and improves recombination at the target genomic locus. b. Colony PCR of clones from classical recombination and simultaneous double selection and recombination. c. All of the clones isolated by simultaneous double selection and recombination have s. DNA integrated at the target locus. The data show the mean of three independent experiments, the error bars represent the standard deviation (n=6). d. Both simultaneous double selection recombination (n = 8), and REXER 2 and REXER 4 (n = 296) result in the right combination of markers. A previously reported method integrating foreign DNA into B. subtilis genome only using negative selection gave 3% of selected clones with right combination of markers,. A previously reported method replacing S. cerevisiae chromosome III fragments with s. DNA only using positive selection gave 0.5% (replacement of 55 kb) to 59% (replacement of 9 kb) of selected clones with right combination of markers (a 13% mean of all reported value plotted with error bar representing the range). For gel source data, see Supplementary Figure 1.
Extended Data Figure 2
Extended Data Figure 2
REXER enables site-specific integration of large DNA fragments into the genome. a. The use of two distinct double selection cassettes -1/+1 (rpsL-KanR) and -2/+2 (sacB-CmR) allows for simultaneous selection for the loss of the negative selection marker on the genome and the gain of the positive selection marker from the BAC, upon integration of synthetic DNA. b. Efficient replacement of genomic rpsL-KanR with BAC bound sacB-CmR using REXER 2 and REXER 4. All colonies contained the correct combination of selection markers after REXER 2 or REXER 4 as analysed by phenotyping, colony PCR, and DNA sequencing (not shown) (n = 22). c. Efficient insertion of 9 kb synthetic DNA. Genomic rpsL-KanR was replaced with a synthetic lux operon coupled to sacB-CmR using REXER 2 and REXER 4. All colonies on the 10-fold dilution double selection plates for REXER 2 and the 104-fold plates for REXER 4 show bioluminescence. 11 colonies each from REXER 2 and REXER 4 showed correct integration by phenotyping, colony PCR, and DNA sequencing (not shown). d. Efficient insertion of 90kb synthetic DNA. The 90 kb DNA consisted of the lux operon in the middle of 80 kb DNA (previously deleted from the MDS42 genome) and followed by sacB-CmR, carried on a BAC. For gel source data, see Supplementary Figure 1.
Extended Data Figure 3
Extended Data Figure 3
Replacement of 100 kb of genomic DNA via REXER. a. The synthetic DNA contain the 100 kb wildtype DNA (open reading frames in grey) with five genes of the lux operon (blue) and sacB-CmR. Complete replacement leads to integration of all five lux genes (luxA, B, C, D, E) resulting in bioluminescent cells, while partial replacement confers loss of one or more lux genes hence loss of bioluminescence. b. After REXER 2, 80 % of 2x102 colonies examined were bioluminescent while for REXER 4 yields 50 % of 2x102 colonies examined were bioluminescent. c. Bioluminescent colonies from REXER 2 and REXER 4 that were analysed (n = 11) had all five lux genes correctly integrated indicating complete replacement of the 100 kb genomic region. All clones contained the right combination of selection markers. d. While bioluminescent colonies contain all five lux watermarks, the non-bioluminescent colonies are missing one or more lux genes indicating partial replacement of the genomic region. All clones contained the right combination of selection markers. For gel source data, see Supplementary Figure 1.
Extended Data Figure 4
Extended Data Figure 4
Iterative REXER. a. The product of REXER shown in Extended Data Figure 2a was used as a template for the next round of REXER. b. The phenotypes of clones from the first round of REXER. c. The phenotypes of clones from the second round of REXER. For gel source data, see Supplementary Figure 1.
Extended Data Figure 5
Extended Data Figure 5
Synonymous codon compression strategies. a. Codon and anticodon interactions in the E. coli genome. 28 sense codons are highlighted in grey, along with the amber stop codon. The genome wide removal of these sense codons, but not other sense codons, would enable all their cognate tRNA to be deleted without removing the ability to decode one or more sense codons remaining in the genome. This is necessary but not sufficient for the reassignment of sense codons to unnatural monomers. Serine, leucine and alanine codon boxes are highlighted because the endogenous aminoacyl-tRNA synthetases for these amino acids do not recognize the anticodons of their cognate tRNAs. This may facilitate the assignment of codons within these boxes to new amino acids through the introduction of tRNAs bearing cognate anticodons that do not direct mis-aminocylation by endogenous synthetases. The number of total codon counts for all 64 triplet codons in the MDS42 genome (GenBank accession no. AP012306), all known codon-anticodon interactions through both Watson-Crick base-paring and wobbling, base modification on tRNA anticodons, tRNA genes, and measured in vivo tRNA relative abundance are reported. This analysis identifies 10 codons from the serine, leucine, and alanine groups (serine codon TCG, TCA, AGT, AGC; leucine codon CTG, CTA, TTG, TTA; and alanine codon GCG, GCA) satisfy both the codon-anticodon interaction and aminoacyl-tRNA synthetases recognition criteria for codon reassignment. b., c. , d. Serine, leucine and alanine codon removal and tRNA deletion strategies compatible with codon reassignment to unnatural amino acids (u.a.a).
Extended Data Figure 6
Extended Data Figure 6
Recoding landscapes for compression of serine codons by REXER. a. The sequences for the systematically recoded mraZ to ftsZ region were de novo designed, synthesized and assembled into BAC and used for REXER. b-d. The recoding landscapes for serine recoding schemes r.s.1-3, and the resulting compiled recoding landscape.
Extended Data Figure 7
Extended Data Figure 7
Recoding landscapes. a-e. r.s.4-8. f. r.s.1 with ftsA codon 407 changed from AGT to AGC (highlighted in orange).
Extended Data Figure 8
Extended Data Figure 8
Identifying and fixing a deleterious sequence in defined and systematic synonymous recoding. a. Recoding codon 407 in ftsA in the wildtype genomic background. The wildtype codon at ftsA codon position 407 is the serine codon TCG. We sequenced 16 post-REXER clones for TCG to AGT and 20 post-REXER clones for TCG to TCT. b. Changing ftsA 407 AGT to AGC in the serine r.s.1 background. We sequenced 16 AGT clones and 16 AGT to AGC clones. c. Changing ftsA 407 AGT to AGC in the serine r.s.1 background dramatically improved the fraction of fully recoded clones across the entire 20 kb region from 0% to 94% (16 clones sequenced). d. The fixed serine r.s.1 with ftsA 407 AGC yielded clones with no measurable growth defect. The doubling times of fully recoded clones from serine r.s.1 with ftsA 407 AGC, from serine r.s.2, serine r.s.3, and alanine r.s.7 are measured, and show no measurable growth defects when compared to the wildtype MDS42 E. coli control with the second double selection cassette integrated at the same genomic locus. n=12 biological replicates ± s.d.. e. Combining single stand DNA recombineering with REXER to fix short a deleterious stretch within the synthetic sequence of r.s. 1. A 90 nt. single stranded oligo was designed to change the deleterious sequence of AGT in ftsA codon position 407 in r.s.1 to a tolerated sequence, AGC. The oligo sequence was designed based on the reverse strand of the synthetic sequence to bind the forward strand with the single nucleotide change positioned in the middle (45 from nt 5’ end). The oligo was co-transformed into E. coli during a REXER experiment which introduces r.s. 1 into the genome.. f. Fixing short deleterious sequence on synthetic DNA with REXER + ssDNA recombineering. 16 clones from REXER double selection described in (e) were randomly picked and subject to single nucleotide polymorphism (SNP) genotyping using primers specific for either the wildtype sequence in ftsA codon position 407 (TCG) or the fixed sequence (AGC). MDS42 rpsLK43R/rK was used as the wildtype control and a fully recoded clone from serine r.s.3 with verified ftsA 407 AGC as the positive control. SNP genotyping at ftsA codon position 407 identified one clone (clone 12, highlighted in orange) out of a total of 16 clones tested with fixed sequence AGC, which was then fully sequenced across the entire 20 kb recoding region and confirmed as fully recoded at all 83 targeted codon positions. For gel source data, see Supplementary Figure 1.
Figure 1
Figure 1
Efficient, programmable insertion of very long synthetic DNA (s. DNA) into the genome of E. coli. a. REXER 2 and REXER 4. CRISPR protospacer sequences are blue and orange rectangles respectively. Triangles indicate spacer RNAs that program cleavage within colour matched protospacers.. REXER 4 augments REXER 2 by adding two extra protospacers (purple and red rectangles), and triggering cleavage with four spacer RNAs. +1 is KanR, -1 is rpsL, +2 is CmR, -2 is sacB. b. REXER 2 and REXER 4 are dependent on the CRISPR/Cas9 system and recombination. Controls omit either spacer RNA or lambda red beta. Data show mean (n=4-6, ± s.d.). c. The efficiency of REXER 2 and REXER 4 is constant for insertions between 2 kb and 90 kb. C.f.u, colony forming units (c.f.u.). The data show the mean (n=6, 3 biological replicates performed in duplicate, ± s.d.) for 2 kb insertion, and the data for 9 kb and 90 kb insertions (n=4, 2 biological replicates performed in duplicate, ± s.d.). It was not possible to obtain a 90 kb linear dsDNA product in vitro for classical lambda red recombination, and our data reflect this, rather than the efficiency of recombination per se. It is well established that lambda red recombination efficiency falls off rapidly with linear dsDNA length.
Figure 2
Figure 2
Iterating REXER for genome stepwise interchange synthesis (GENESIS). a. Iterative genomic replacement by REXER will enable genome replacement in less than 40 linear steps. b. Iterative REXER replaces 220 kb of the E. coli genome with 230 kb of synthetic DNA in two steps. LuxA, B, C, D, E (cyan rectangles) are necessary and sufficient for luminescence. hph (violet rectangle) is the hygromycin B phosphotransferase gene, conferring resistance to hygromycin B. c. Cells phenotype correctly through rounds of REXER. The parental cell line (genomewt), independent clones from the 1st round of REXER (clone A and B), and independent clones from the 2nd round of REXER (clone C and D), Lumi (luminescence), Cm (chloramphenicol), Kan (Kanamycin), Suc (Sucrose), Strep (Streptomycin). d. Cells genotype correctly through rounds of REXER. For gel source data, see Supplementary Figure 1.
Figure 3
Figure 3
Systematic and defined synonymous codon reassignment in an E. coli operon rich in essential genes. a. Identifying codons target for removal (grey) and the synonyms to which they are reassigned (pink) in each recoding scheme. Lines indicate codon-anticodon interactions. Replacements were chosen by cAi, tAi, or t.E. Application of each recoding scheme genome-wide would allow the targeted codons to be completely removed from the E. coli genome and, following deletion of the cognate tRNA genes, codon reassignment to orthogonal translation systems for unnatural polymer synthesis. b. Identifying a target operon rich in target codons and essential genes to test recoding schemes. The top panel indicates the positions of essential genes. In the bottom three panels the y axis scores the number of the indicated target codons in essential genes at the genomic position indicated on the x axis. The mraZ to ftsZ region (coloured in red) was identified in the highest scoring 20 kb region across the E. coli MDS42 genome for all targeted codons. c. Position and density of targeted codons in the mraZ to ftsZ region. The positions of targeted codons (the indicated sense codons plus TAG to TAA) are coloured in red and pink regions with red outlines indicate duplicated regions (d.r.s) which refactor overlapping open reading frames to enable independent recoding of the downstream open reading frames.
Figure 4
Figure 4
Compiled recoding landscapes of targeted codons reveal allowed and disallowed synonymous recoding schemes, and enable the identification and repair of idiosyncratic positions in the genome. The fraction of recoding across sixteen independent sequences is indicated on the y axis of the graphs. Codons positions that are not recoded with the indicated scheme are in black. a., b., c. Complied recoding landscapes of targeted serine, leucine and alanine codons respectively. d. Identifying and fixing a deleterious sequence in defined and systematic synonymous recoding. The compiled recoding landscape of serine r.s.1, is plotted in red, revealing the single position at which the wild type sequence is maintained, codon 407 in ftsA. The compiled recoding landscape of serine r.s.1 with ftsA 407 AGT changed to AGC (as in serine r.s.2 and r.s.3) is plotted in orange. This mutation repairs the deleterious effect of ftsA 407 AGT without reintroducing the codons targeted for removal.

Comment in

  • Hacking rules for E. coli.
    Jarchum I. Jarchum I. Nat Biotechnol. 2016 Dec 7;34(12):1249. doi: 10.1038/nbt.3744. Nat Biotechnol. 2016. PMID: 27926721 No abstract available.

Similar articles

Cited by

References

    1. Cello J, Paul AV, Wimmer E. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science. 2002;297:1016–1018. - PubMed
    1. Chan LY, Kosuri S, Endy D. Refactoring bacteriophage T7. Molecular Systems Biology. 2005;1 2005.0018–E10. - PMC - PubMed
    1. Itaya M, Tsuge K, Koizumi M, Fujita K. Combining two genomes in one cell: stable cloning of the Synechocystis PCC6803 genome in the Bacillus subtilis 168 genome. Proc Natl Acad Sci USA. 2005;102:15971–15976. - PMC - PubMed
    1. Gibson DG, et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science. 2008;319:1215–1220. - PubMed
    1. Gibson DG, et al. Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science. 2010;329:52–56. - PubMed

Publication types