Total synthesis of Escherichia coli with a recoded genome

doi:10.1038/s41586-019-1192-5

. 2019 May;569(7757):514-518.

doi: 10.1038/s41586-019-1192-5. Epub 2019 May 15.

Total synthesis of Escherichia coli with a recoded genome

Julius Fredens^#¹, Kaihang Wang^#^{1

2}, Daniel de la Torre^#¹, Louise F H Funke^#¹, Wesley E Robertson^#¹, Yonka Christova¹, Tiongsun Chia¹, Wolfgang H Schmied¹, Daniel L Dunkelmann¹, Václav Beránek¹, Chayasith Uttamapinant^{1

3}, Andres Gonzalez Llamazares¹, Thomas S Elliott¹, Jason W Chin⁴

Affiliations

¹ Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.
² Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
³ School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand.
⁴ Medical Research Council Laboratory of Molecular Biology, Cambridge, UK. chin@mrc-lmb.cam.ac.uk.

^# Contributed equally.

PMID: 31092918
PMCID: PMC7039709
DOI: 10.1038/s41586-019-1192-5

Total synthesis of Escherichia coli with a recoded genome

Julius Fredens et al. Nature. 2019 May.

. 2019 May;569(7757):514-518.

doi: 10.1038/s41586-019-1192-5. Epub 2019 May 15.

Authors

Affiliations

¹ Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.
² Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
³ School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand.
⁴ Medical Research Council Laboratory of Molecular Biology, Cambridge, UK. chin@mrc-lmb.cam.ac.uk.

^# Contributed equally.

PMID: 31092918
PMCID: PMC7039709
DOI: 10.1038/s41586-019-1192-5

Abstract

Nature uses 64 codons to encode the synthesis of proteins from the genome, and chooses 1 sense codon-out of up to 6 synonyms-to encode each amino acid. Synonymous codon choice has diverse and important roles, and many synonymous substitutions are detrimental. Here we demonstrate that the number of codons used to encode the canonical amino acids can be reduced, through the genome-wide substitution of target codons by defined synonyms. We create a variant of Escherichia coli with a four-megabase synthetic genome through a high-fidelity convergent total synthesis. Our synthetic genome implements a defined recoding and refactoring scheme-with simple corrections at just seven positions-to replace every known occurrence of two sense codons and a stop codon in the genome. Thus, we recode 18,214 codons to create an organism with a 61-codon genome; this organism uses 59 codons to encode the 20 amino acids, and enables the deletion of a previously essential transfer RNA.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1. Using 100-kb fragments of synthetic DNA to replace the corresponding regions in the genome through REXER, and using GENESIS for the stepwise replacement of genomic DNA by synthetic DNA to generate recoded sections.
a, REXER uses CRISPR–Cas9- and lambda-red-mediated recombination to replace genomic DNA with synthetic DNA provided from an episome (BAC). This enables large regions of the genome (>100 kb) to be replaced by synthetic DNA. The black triangles denote the location of CRISPR protospacers, which are cleaved by Cas9 to liberate the synthetic DNA (pink) cassette from the BAC flanked by homology regions. Homology regions 1 and 2 program the location of recombination into the *E. coli* genome. The double-selection cassette (−1, +1) ensures the integration of the synthetic DNA, and the double-selection cassette (−2, +2) on the genome ensures the removal of the corresponding wild-type DNA. In the example shown in the figure, +1 is *kanR*, −1 is *rpsL*, +2 is *cat* and −2 is *sacB*. b, Iterative cycles of REXER, with alternating choices of positive- and negative-selection cassettes, enables GENESIS. This enables large sections of the synthetic genome to be assembled through the iterative addition of fragments, which replace the corresponding genomic sequences, in a clockwise manner. The first REXER of a 100-kb synthetic fragment of DNA leaves a −1, +1 double-selection cassette on the genome, which acts as a landing site for the downstream integration of a second fragment of synthetic DNA that contains a −2, +2 double-selection cassette. In the example shown, +1 is *kanR*, −1 is *rpsL*, +2 is *cat* and −2 is *sacB*, but the same logic can be used with different permutations of positive and negative selection markers on the genome and the BAC.

**Extended Data Fig. 2. Recoding *ftsI-murE* and *map* in fragment 1.**
a, Recoding landscape of fragment 1. We sequenced six clones after REXER. Each dot represents the frequency of recoding within the sequenced clones (y axis) for a target codon at the indicated position in the genome (x axis). Black dots indicate positions at which we did not observe recoding. Four codons and a refactoring of *ftsI-murE*, and one codon in map, were rejected. b, Refactoring the 14-bp overlap of *ftsI* and *murE*. The codons and overlaps are colour-coded by their post-REXER replacement frequency in the clones sequenced. Using our initial refactoring scheme (refactoring 1) (in which the overlap plus 20 bp of upstream sequence was duplicated), we did not observe replacement of the overlap by synthetic DNA (in the six clones sequenced after REXER). Refactoring scheme 2 (refactoring 2) (which duplicates the overlap plus 182 bp of upstream sequence) resulted in complete recoding of this region in 12 of the 16 post-REXER clones that we sequenced. c, Testing alternative codons at Ser4 in *map*. A double-selection cassette, *pheS*-HygR*, on a constitutive EM7 promoter was introduced upstream of *map*, followed by a ribosome-binding site. We replaced the cassette using linear double-stranded DNA that introduces alternative codons (purple bar) at position four, via lambda-red recombination and negative selection for loss of *pheS**. DNA with AGC and AGT did not integrate (0/16 clones); we recovered one clone for AGC but sequencing revealed that it contained a mutant AAC (Asn) codon. TCT (6/8), TCC (6/16), ACA (6/8) and TTA (4/8) were allowed. d, Recoding landscape (purple) over the genomic region shown in a, following REXER with a BAC that contained refactoring scheme 2 for the *ftsI-murE* overlap and TCT at position 4 in *map*. In total, 2/7 post-REXER clones were completely refactored and recoded, and each target codon was replaced in at least 5/7 clones. The data from a are shown in red for comparison.

**Extended Data Fig. 3. Recoding *rne* and *yceQ* in fragment 9.**
a, Recoding landscape of fragment 9. Our designed synthetic sequence of fragment 9 was integrated into the genome by REXER, and 19 clones were completely sequenced by next-generation sequencing. The recoding landscape graph shows the frequency at which each target codon was recoded across the 19 clones. Although most codon replacements were accepted, recoding of a 26-kb region was consistently rejected; codon positions with a recoding frequency of zero in all the sequenced clones are indicated by black dots. To pinpoint the problematic sequence, 10-kb stretches of the genome (labelled G2 to G7) were deleted in the presence of the episomal copy of synthetic fragment 9. The synthetic sequence was sufficient to support deletion of all stretches except G4 (dark grey box), which suggests that an underlying problem is within this stretch. None of the nineteen clones was completely recoded. b, Recoding landscape of stretch G4. After REXER across the 10-kb G4 stretch, and sequencing of 10 clones, the recoding landscape shown was generated. This revealed a clear recoding minimum at *yceQ*—a ‘gene’ that encodes a predicted protein for which there is little evidence of transcription, protein synthesis or homologues. All target codons in *yceQ* were recoded at least once in individual clones, but never simultaneously; thus, the minimum of the recoding landscape does not reach zero, and 0/10 clones were completely recoded. This is consistent with epistasis between the targeted positions. In the map below the recoding landscape, sequences annotated as essential are shown in dark grey and target codons are shown in red. The sequence position (x axis) is with reference to a. c, Altered design of the region surrounding *rne* in fragment 9. Top, original design of *yceQ* recoding and *rne* (which encodes RNase E) regulatory sequences. Target codons are shown in red. P1rne, P2rne and P3rne are the promoters (blue arrows) for the essential gene *rne*; these are found in and around the hypothetical gene *yceQ*. The −10 sequence of the major promoter P1rne is mutated by our initial design. The sequences that contains hairpin 1 (hp1) and hairpin 2 (hp2), which bind to RNase E to mediate transcript degradation, are shown as blue bars; these sequences encompass the remaining target codons and are also mutated by our initial design. Bottom, the second codon in *yceQ* was replaced with a stop codon (purple) and the remaining target codons retained their original sequence. The sequence position (x axis) is with reference to a. d, The modified fragment 9 (from c) was integrated on the genome, which resulted in complete recoding in 4/5 clones that we sequenced. The axes of the graph are the same as in a. The recoding landscape for the modified fragment 9, derived from sequencing five clones, is shown in purple. The data from a are reproduced for comparison.

**Extended Data Fig. 4. Recoding *yaaY* in fragment 37a.**
a, Recoding landscape of fragment 37a. Our designed synthetic sequence of fragment 37a was integrated into the genome by REXER, and six clones were completely sequenced by next-generation sequencing. Although most codon replacements were accepted, recoding of a 6.5-kb region was consistently rejected. Target-codon positions that were never recoded in the six clones sequenced are indicated by black dots. b, Identification of the problematic target codon. Within the identified 6.5-kb problematic region, we first focused on codons in essential genes (dark grey arrows) rather than non-essential genes (light grey arrows). Sanger sequencing (black bar) of 24 clones showed that 2 clones were recoded in all 6 target codons within a sub-section of the essential genes. Further Sanger sequencing of the remaining target codons in essential genes in these two clones revealed that 1 clone was recoded at all 17 target codons. This clone was completely sequenced by next-generation sequencing and used to generate a recoding landscape, in which each target codon is either recoded (red) or not recoded (black). In combination with the recoding landscape in a, this enabled us to identify a problematic region 1.8-kb upstream of *ribF*. Here we focused on the four target codons in the genes *rpsT* and *yaaY* as the nearest codons to the essential *ribF* gene. Sanger sequencing of 33 clones across this sequence revealed only 1 codon that was never recoded—the codon for Ser70 in the hypothetical gene *yaaY* (sequencing results are shown as colour-coded on the gene map of *rspT* and *yaaY*). We therefore investigated alternative codon replacements in *yaaY*. c, Alternative codon replacement in the hypothetical gene *yaaY*. At position Ser70 in this gene, replacement of TCA with AGT was not successful. To investigate alternative codon replacement schemes, a double-selection marker (*pheS*-HygR*) on a constitutive EM7 promoter, followed by a ribosome-binding site, was introduced into *yaaY*, 12 bp upstream of the codon for Ser70. The negative-selection marker was then used to select for clones that had replaced the cassette using linear doublestranded DNA that introduces alternative codons (purple bar) at position 70, via lambda-red recombination. Although linear double-stranded DNA with AGT did not integrate (0/16 clones), integration of double-stranded DNA with TCC (2/16), TCG (2/16), TCT (6/16) and AGC (9/16) proved viable. d, Recoding landscape following REXER with a BAC that contains a corrected version of fragment 37a, bearing AGC at position Ser70 in the hypothetical gene *yaaY* (purple). When integrated by REXER, we identified 1/7 completely recoded clones. AGC at position Ser70 in *yaaY* was introduced in 4/7 clones.

**Extended Data Fig. 5. Substitutions in the hypothetical gene *yceQ* overlap with regulatory elements in *rne*.**
a, In our original design, a programmed substitution of a TCA (blue) to AGT (red) in the hypothetical gene *yceQ* leads to mutation of the −10 region of the P1rne promoter (boxed). The transcriptional start site (tss) of this promoter for rne transcription is indicated by an arrow; this is the major promoter for *rne* transcription. b, Target-codon substitutions overlap with and may potentially disrupt the key regulatory hairpins (hp2 and hp3) in the long 5′ untranslated region of the *rne* transcript. hp2 and hp3 mediate a regulatory feedback loop, in which RNase E is recruited to the mRNA to promote degradation of its own transcript. A schematic of the wild-type secondary structure of the *rne* 5′ untranslated region is shown. The target codons for synonymous replacement are highlighted in blue.

**Extended Data Fig. 6. Completing sections A, B and H.**
a, GENESIS was initiated with fragment 4 and proceeded smoothly until fragment 9, in which we were unable to recode *yceQ*. Identifying and fixing the problems with our initial design of fragment 9 was carried out as described in Extended Data Fig. 3, by introducing a stop codon (yellow line) at the start of the predicted *yceQ* ORF. Following a swap of the *sacB-cat* (sC) double-selection cassette at the end of fragment 9 for a *pheS*-HygR* (pH) double selection cassette, this strain was ready to act as the recipient for conjugation to assemble a strain in which fragments 4–13 (section A plus section B) are fully recoded. In parallel, we continued to recode the strain that contains the recoded fragment 4 to incomplete fragment 9 by GENESIS; this generated a second strain for assembly in which fragments 4–8 and 10–13 were completely recoded, and fragment 9 was partially recoded. We then integrated *oriT* (white triangle) 3 kb upstream of the start of fragment 10 in the second strain to generate a donor for conjugation, to assemble a strain in which fragments 4–13 (section A plus section B) are fully recoded. Conjugation of the donor and recipient strains resulted in a strain in which sections A and B are fully recoded. rK, *rpsL-kanR* double-selection cassette. b, Individual REXER of fragments 37a and 1 led to incomplete recoding. We carried out troubleshooting of both fragments independently (Extended Data Figs. 2, 4). The repairs are indicated with yellow and purple lines in fragment 37a and fragment 1, respectively. Each strain then served as a starting point for two independent sets of GENESIS; one generated 37a–37b (on the left) and ended in an *rpsL-kanR* double-selection cassette, and one generated 1–3 (on the right) and ended in a *sacB-cat* double-selection cassette. We integrated an *oriT* (white triangle) 3 kb upstream of the start of fragment 1, and this strain served as a donor for the directed conjugation of 1–3 into 37a–37b. The correct product was selected for by the gain of *cat* and the loss of *rpsL*. This resulted in the completion of section H in a single strain.

**Extended Data Fig. 7. Assembly of an organism with a fully synthetic genome through conjugation of recoded genome sections.**
a, Schematic assembly of partially synthetic donor and recipient genomes into a more-synthetic genome, through conjugation. In the recipient cell, the recoded genome section (pink) is extended with recoded DNA (dark pink)—commonly, 3–4 kb—by a lambda-red-mediated recombination and positive and negative selection; this step takes advantage of the genomic markers at the end of the recoded sequence that are introduced by GENESIS, and provides a homology region with the end of the recoded fragment in the donor strain. The donor strain is prepared by integration of an *oriT* at the end of the recoded DNA. The indicated positive and negative selection ensures the survival of recipient strains, and selects for recipients that have successfully integrated the synthetic DNA from the donor. An F′ plasmid that contains a mutation in the *oriT* sequence that makes it non-transferrable was used to facilitate conjugation of the donor genome to the recipient. +2, *cat*; −2, *sacB*; +3, *HygR*; −3, *pheS**; +4, *aacC1* (a gene conferring gentamycin resistance); +5, *tetA* (a gene conferring tetracycline resistance). The homologous regions in the donor and recipient are both shown in dark pink. b, Synthetic genomic sections (pink) from multiple individual partially recoded genomes were assembled into a single fully recoded genome using conjugative assembly. The donor (d) and recipient (r) strains contain unique recoded genomic sections labelled in pink; recoded overlapping homology regions (3 kb to 400 kb in size) were used to seamlessly recombine the strains, and are shown in dark pink. Small homology regions ranging from 3 to 5 kb in size are denoted with an asterisk. Conjugations for which we used greater than 5-kb homology (HR) are indicated. For assembly, the recoded genomic content from the donor was conjugated in a clockwise manner to replace the corresponding wild-type genomic section (grey) in the recipient. The origin of strain AB and strain H is described in detail in Extended Data Fig. 6; all other individual synthetic genomes were generated by GENESIS (Extended Data Fig. 1). Conjugation followed by recombination proceeded until the final fully recoded A–H strain was assembled and sequence-verified by next-generation sequencing.

**Extended Data Fig. 8. Characterization of an organism with a fully synthetic genome.**
a, Doubling times for Syn61 and MDS42. Our fully synthetic recoded *E. coli* Syn61 has a doubling time that is 1.6x longer than that of MDS42, when grown in standard medium conditions (90.1 min versus 57.6 min in lysogeny broth (LB) + 2% glucose). The ratio of growth rates between Syn61 and MDS42 in LB (decreased carbon catabolite repression) at 37 °C is 1.7, in M9 minimal medium is 1.7, in richer medium (2XTY) is 1.4, in LB at 25 °C is 2.5 and in LB at 42 °C is 1.3. The doubling times in different medium conditions are: LB at 37 °C, 58.3 min and 100.6 min; LB + 2% glucose, 57.6 min and 90.1 min; M9 minimal medium, 130.5 min and 221.1 min; 2XTY, 68.2 min and 92.6 min; LB at 25 °C, 86.3 min and 218.4 min; LB at 42 °C, 77.4 min and 99.7 min, for MDS42 and Syn61, respectively. Syn61 containing a plasmid without (−) or with (+) *serV* exhibited a growth-rate ratio of 0.99 (138.3 min versus 136.2 min). Doubling times represent the average of ten independently grown biological replicates of each strain, and are shown as mean ± s.d. (see Supplementary Methods). The data for individual experiments are represented by dots. b, Representative microscopy images of *E. coli* strain MDS42 and Syn61. Samples were imaged on an upright Zeiss Axiophot phase-contrast microscope using a 63X 1.25 NA Plan Neofluar phase objective (see Supplementary Methods). The experiment was performed twice with similar results. c, Histogram of cell lengths quantified from microscopy images of strains MDS42 and Syn61. The mean cell length (±s.d.) for MDS42 was 1.97 ± 0.57 μm and for Syn61 was 2.3 ± 0.74 μm. Images of n = 500 cells were taken during exponential growth phase for both strains. Cell-length measurements were made using Nikon NIS Elements software (see Supplementary Methods). A 1-μm lower size limit was imposed to remove background particulates and dust from quantification; this also precludes quantification of extracellular vesicles. d, Label-free quantification of the MDS42 and Syn61 proteomes. Each strain was grown in three biological replicates. Each biological replicate was analysed by tandem mass spectrometry in technical duplicate. Technical duplicates of biological replicates were merged. A total of 1,084 proteins was quantified across the samples. No protein quantified in both MDS42 and Syn61 differed in abundance—as judged by label-free quantification values—by more than 1.16-fold.

**Extended Data Fig. 9. Consequences of synonymous codon compression in Syn61.**
a, Synonymous codon compression and deletion of *prfA, serU* and *serT* in *E. coli*. The grey boxes shows the *E. coli* serine codons and stop codons, together with the tRNAs and release factors that decode them in wild-type *E. coli* (WT genome). tRNA anticodons and release factors are connected to the codons that they are predicted to read by black lines. The tRNA and release factor genes are shown in the black boxes. Synonymous codon compression (syn. codon. comp.) leads to Syn61 cells with a recoded genome (pink boxes), in which TCG and TCA codons are removed. The abundance of each codon is listed in its box. b, As in Fig. 4b, but with the *M. mazei* PylRS/tRNA^Pyl_UGA pair (anticodon UGA). There are fewer cognate codons to this anticodon in Syn61 than in MDS42; CYPK addition might therefore be expected to be less toxic in Syn61, as observed. c, As in Fig. 4b, but with the *M. mazei* PylRS/tRNA^Pyl_GCU pair (anticodon GCU). There are a greater number of cognate codons to this anticodon in Syn61 than in MDS42; CYPK addition might therefore be expected to be more toxic in Syn61, as observed. d, *serT* (dark grey) is deleted by insertion of a *PheS*-HygR* double-selection cassette (black) via lambda-red-mediated recombination. Recombination yields new junctions 1 and 2, indicated by green and blue bars. For each recombination, both junctions were sequence-verified by Sanger sequencing. Above the Sanger chromatograms, the arrows indicate the precise location of the junction, the blue bar indicates the sequence that corresponds to the selection cassette and the green bar corresponds to the genomic sequence that flanks the selection cassette. The primers used to generate selection cassettes with suitable homologies to *serU*, *serT* and *prfA* for recombination are provided in Supplementary Data 21. The experiment was performed once. e, *prfA* (dark grey) is deleted by the insertion of an *rpsL-kanR* double-selection cassette (in black) via lambda-red-mediated homologous recombination. The agarose gels are annotated as described in Fig. 4c, and the rest of the data are annotated as described in d. The experiment was performed once. f, *serU* (dark grey) is deleted by insertion of a *PheS*-HygR* double-selection cassette (in black) via lambda-red-mediated recombination. The agarose gels are annotated as described in Fig. 4c, and the rest of the data are annotated as described in d. The experiment was performed once. The full gels are available in Supplementary Fig. 1.

**Extended Data Fig. 10. The scale of genome synthesis, and scale and fidelity of recoding.**
a, Genome and chromosome synthesis. The size (in Mb) of synthetic genomes that have been produced for *M. genitalium* and *M. mycoides*^,, and several *S. cerevisiae* chromosomes^– (light grey). The size of the synthetic *E. coli* genome presented here is shown in dark grey. b, Genome recoding efforts. Attempts to recode target codons TTA and TTG in *Salmonella enterica* serovar Typhimurium LT2; AGC, AGT, TTG, TTA, AGA, AGG and TAG in *E. coli*; AGA and AGG in *E. coli*, as well as recoding of all TAG in *E. coli* (light grey), compared to the removal of all TCA, TCG and TAG in *E. coli* presented here (dark grey). The total number of codons recoded in a single strain is shown on the graph, and the maximum percentage of target codons recoded in a single strain in each effort is indicated. c, Number of reported non-programmed mutations and indels as a function of the number of target codons recoded for the experiments shown in b.

**Fig. 1. Design of the synthetic genome, implementing a defined recoding scheme for synonymous codon compression.**
a, The defined recoding scheme for synonymous codon compression. Synonymous serine codons and three stop codons used in the genome of wild-type *E. coli* are shown (grey boxes). Systematically implementing a defined recoding scheme for synonymous codon compression (red arrows) recodes target codons to defined synonyms, and replaces the amber stop codon TAG with the ochre stop codon TAA. This creates an organism with a recoded genome that uses a reduced number of serine and termination codons (pink boxes). b, Refactoring of 3′, 3′ overlaps enables their independent recoding. The overlap between two ORFs (ORF1 and ORF2) is duplicated, which enables independent recoding (red box) of these ORFs. c, Refactoring 5′, 3′ overlaps. The overlap plus 20 bp upstream is duplicated to generate a synthetic insert. When the overlap is longer than 1 bp at the end of the upstream ORF, an in-frame TAA (black box) is introduced in the beginning of the synthetic insert; this in-frame stop codon ensures the termination of translation from the original ribosome-binding site. Thus, all full-length translation of the downstream ORF is initiated from the reconstructed ribosome-binding site in the synthetic insert. This refactoring enables the independent recoding (red box) of ORFs. d, Map of the synthetic genome design with all TCG, TCA and TAG codons removed. Outer ring shows positions (18,218 red bars) of all TCG to AGC, TCA to AGT and TAG to TAA recodings. Grey ring shows positions of designed silent mutations in overlaps (12 green bars), refactoring of 3′, 3′ overlaps (schematic shown in b, 21 blue bars) and refactoring of 5′, 3′ overlapping regions (schematic shown in c, 58 black bars). Pink ring shows 37 fragments of approximately 100 kb in size each. Fragment 37 is shown as 37a and 37b to reflect the final assembly. The sections A to H are indicated.

**Fig. 2. Retrosynthesis of the synthetic genome.**
a, Disconnecting the genome into eight sections. The synthetic genome was disconnected into sections A to H, with each section corresponding to approximately 0.5 Mb (step 1). The position of the replication origin oriC (orange square) is indicated. Sections were assembled into a completely recoded genome (in the forward sense, opposite to the direction of the retrosynthesis arrow) by directed conjugation (Fig. 3, Extended Data Fig. 7). b, Disconnecting genome sections into 100-kb fragments. Sections are further disconnected into 4 or 5 fragments of around 100 kb in length each. Section A is depicted, and other sections were treated similarly. Nearly all sections were constructed entirely through consecutive REXER steps, by GENESIS (Extended Data Fig. 1). Each step replaced around 100 kb of wild-type genomic sequence with 100 kb of synthetic fragment (steps 2 and 3). c, Disconnecting each 100-kb synthetic fragment into 10-kb synthetic stretches. Each 100-kb synthetic fragment is further disconnected into 9 to 14 short synthetic stretches of around 10 kb in length (step 4). The BACs that carry 100-kb synthetic fragments (pink) were assembled by homologous recombination in yeast. Each BAC contains Cas9 cleavage sites (black triangles) that enable excision of the synthetic DNA in vivo, homology regions 1 and 2 (HR1 and HR2) for targeting recombination, and the appropriate double-selection cassette. The −2 (sucrose sensitivity, encoded by sacB),+2 (chloramphenicol resistance, encoded by cat) double selection cassette is indicated. However different double selection cassettes are used for selection in different steps of REXER. A negative-selection marker (rpsL; indicated as −1) is used to enable loss of the backbone after REXER. BAC and yeast artificial chromosome (YAC) origins and a URA3 marker, all for maintenance in *E. coli* and *S. cerevisiae*, are indicated.

**Fig. 3. Assembly of recoded genome sections to create Syn61.**
Synthetic genomic sections (pink) from multiple individual partially recoded genomes were assembled into a single fully recoded genome in the indicated sequence of conjugations. The donor (d) and recipient (r) strains contain unique recoded genomic sections, denoted in pink. The recoded genomic content from the donor was conjugated in a clockwise manner to replace the corresponding wild-type genomic section (grey) in the recipient. Conjugation proceeded until the final fully recoded A to H strain (that is, Syn61) was assembled. Extended Data Figure 7 shows the process in more detail, including all homology regions.

**Fig. 4. Functional consequences of synonymous codon compression in Syn61.**
a, Synonymous codon compression and deletion of *prfA*, *serU* and *serT*. The grey boxes show the serine codons and stop codons, together with the tRNAs and release factors that decode them in wild-type *E. coli* (wild-type genome). tRNA anticodons and release factors are connected to the codons that they are predicted to read by black lines. The tRNA and release factor genes are shown in the black boxes. Synonymous codon compression leads to a recoded genome (pink boxes), in which tRNAs with CGA anticodons should have no cognate codons and *serT* should be dispensable. All factors that read the target codons should be dispensable in Syn61. b, Co-translational incorporation of the non-canonical amino acid Nε-(((2-methylcycloprop-2-en-1-yl) methoxy) carbonyl)-l-lysine (CYPK), using the orthogonal *Methanosarcina mazei* pyrrolysyl-tRNA synthetase (PylRS)/tRNA^Pyl_CGA pair, was toxic in MDS42, but not in Syn61. When provided with CYPK, this pair will incorporate the noncanonical amino acid in response to TCG codons in a dose-dependent manner. In MDS42 (grey), this incorporation leads to mis-synthesis of the proteome, and toxicity. In Syn61 (pink) (which does not contain TCG codons), this is non-toxic. The lines follow the mean of three biological replicates (each shown as a dot) at each CYPK concentration (0 mM, 0.5 mM, 1 mM, 2.5 mM and 5 mM). Percentage of maximum growth was determined by the final optical density at 600 nm (OD₆₀₀) with the indicated concentration of CYPK divided by the final OD₆₀₀ in the absence of CYPK. Final OD₆₀₀ values were determined after 600 min. c, Synonymous codon compression enables deletion of *serT* in Syn61. PCR flanking the *serT* locus before (−) and after (clones 1 and 2) replacement with a *PheS*-HygR* double selection cassette; HygR denotes hygromycin resistance (*aph(4)-Ia*), PheS* denotes a mutant of *PheS* that encodes a Thr251Ala, Ala294Gly mutant of phenylalanyl-tRNA synthetase. The experiment was performed once. See Extended Data Fig. 9. Full gels are in Supplementary Fig. 1.

See this image and copyright information in PMC

Comment in

Construction of an Escherichia coli genome with fewer codons sets records.
Blount BA, Ellis T. Blount BA, et al. Nature. 2019 May;569(7757):492-494. doi: 10.1038/d41586-019-01584-x. Nature. 2019. PMID: 31097820 No abstract available.

Cited by

Application and Technical Challenges in Design, Cloning, and Transfer of Large DNA.
Bai S, Luo H, Tong H, Wu Y. Bai S, et al. Bioengineering (Basel). 2023 Dec 15;10(12):1425. doi: 10.3390/bioengineering10121425. Bioengineering (Basel). 2023. PMID: 38136016 Free PMC article. Review.
Adding α,α-disubstituted and β-linked monomers to the genetic code of an organism.
Dunkelmann DL, Piedrafita C, Dickson A, Liu KC, Elliott TS, Fiedler M, Bellini D, Zhou A, Cervettini D, Chin JW. Dunkelmann DL, et al. Nature. 2024 Jan;625(7995):603-610. doi: 10.1038/s41586-023-06897-6. Epub 2024 Jan 10. Nature. 2024. PMID: 38200312 Free PMC article.
Genomically recoded Escherichia coli with optimized functional phenotypes.
Hemez C, Mohler K, Radford F, Moen J, Rinehart J, Isaacs FJ. Hemez C, et al. bioRxiv [Preprint]. 2024 Aug 29:2024.08.29.610322. doi: 10.1101/2024.08.29.610322. bioRxiv. 2024. PMID: 39257802 Free PMC article. Preprint.
Sc3.0: revamping and minimizing the yeast genome.
Dai J, Boeke JD, Luo Z, Jiang S, Cai Y. Dai J, et al. Genome Biol. 2020 Aug 13;21(1):205. doi: 10.1186/s13059-020-02130-z. Genome Biol. 2020. PMID: 32791980 Free PMC article. No abstract available.
Chemical Reaction Models in Synthetic Promoter Design in Bacteria.
Kahramanoğulları O. Kahramanoğulları O. Methods Mol Biol. 2024;2844:3-31. doi: 10.1007/978-1-0716-4063-0_1. Methods Mol Biol. 2024. PMID: 39068329

See all "Cited by" articles

References

1. Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins. Nature. 1961;192:1227–1232. - PubMed
1. Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255–258. doi: 10.1126/science.1170160. - DOI - PMC - PubMed
1. Cho BK, et al. The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol. 2009;27:1043–1049. doi: 10.1038/nbt.1582. - DOI - PMC - PubMed
1. Li GW, Oh E, Weissman JS. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature. 2012;484:538–541. doi: 10.1038/nature10965. - DOI - PMC - PubMed
1. Sorensen MA, Pedersen S. Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol. 1991;222:265–280. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
- The Lens - Patent Citations Database
Research Materials
- Addgene Non-profit plasmid repository

[1] Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins. Nature. 1961;192:1227–1232. - PubMed

[2] Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins. Nature. 1961;192:1227–1232. - PubMed

[3] Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255–258. doi: 10.1126/science.1170160. - DOI - PMC - PubMed

[4] Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255–258. doi: 10.1126/science.1170160. - DOI - PMC - PubMed

[5] Cho BK, et al. The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol. 2009;27:1043–1049. doi: 10.1038/nbt.1582. - DOI - PMC - PubMed

[6] Cho BK, et al. The transcription unit architecture of the Escherichia coli genome. Nat Biotechnol. 2009;27:1043–1049. doi: 10.1038/nbt.1582. - DOI - PMC - PubMed

[7] Li GW, Oh E, Weissman JS. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature. 2012;484:538–541. doi: 10.1038/nature10965. - DOI - PMC - PubMed

[8] Li GW, Oh E, Weissman JS. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature. 2012;484:538–541. doi: 10.1038/nature10965. - DOI - PMC - PubMed

[9] Sorensen MA, Pedersen S. Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol. 1991;222:265–280. - PubMed

[10] Sorensen MA, Pedersen S. Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol. 1991;222:265–280. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Total synthesis of Escherichia coli with a recoded genome

Affiliations

Total synthesis of Escherichia coli with a recoded genome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials