Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 15;12(9):2750-2763.
doi: 10.1021/acssynbio.3c00358. Epub 2023 Sep 6.

DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters

Affiliations

DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters

Tuan M Pham et al. ACS Synth Biol. .

Abstract

We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-d-2'-deoxyribofuranosyl)-imidazo-[1,2-a]-1,3,5-triazin-(8H)-4-one and 6-amino-3-(1'-β-d-2'-deoxyribofuranosyl)-5-nitro-(1H)-pyridin-2-one, abbreviated as P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find G-Z pairs have stability comparable to that of A-T pairs and should therefore be included as base pairs in structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include the P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing it with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases in which Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.

Keywords: DNA folding thermodynamics; DNA secondary structure design; expanded DNA alphabet; synthetic biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): DNA Analytics applied for a U.S. patent, PCT/US23/64288 (Generating Parameters to Predict Hybridization Strength of Nucleic Acid Sequences), to protect the applicable optical melt curve fit methods.

Figures

Figure 1
Figure 1
(Top) The P–Z base pair has three hydrogen bonds. (Middle) The proposed G–Z “wobble” base pair with two hydrogen bonds. (Bottom) The deprotonated G–Z pair is dominant at elevated pH. All results in this paper refer to pH 7.
Figure 2
Figure 2
Predicted ΔG°37 as a function of experimental ΔG°37 for the P–Z stacks. The P–Z stack component is the measured folding free energy change minus the contributions of Watson–Crick–Franklin stacks, intermolecular initiation, and the symmetry correction (applied to self-complementary duplexes only). The y = x line is shown for reference. The duplex (ZGCATGCP)2 was removed from the fit as an outlier. The duplex (GTPPZZAC)2 has the largest residual for a value used in the fit, at 1.41 kcal/mol. The data were derived from Hoshika et al., Wang et al., and additional duplexes reported here (Table S1).
Figure 3
Figure 3
Dangling ends measured with P and Z nucleotides. (A) The stability of 5′ dangling nucleotides on Z–P pairs (blue), compared to C–G pairs (orange). The dangling end is less stabilizing on Z–P than on C–G pairs. (B) The stability of P and Z dangling ends, as compared to canonical purines and pyrimidines, respectively. The closing pair and orientation are indicated along the top. The dangling end nucleotide identity is directly above each blue or orange bar.
Figure 4
Figure 4
Stability of PP and ZZ terminal mismatches (blue), compared to G–G, A–A, C–C, and T–T terminal mismatches (orange). The left series have a G–C terminal pair and the right series have a T–A terminal pair.
Figure 5
Figure 5
Stability increments of terminal mismatches on terminal P–Z pairs (blue). The stabilities of analogous terminal mismatches on G–C pairs are shown (orange).
Figure 6
Figure 6
Stability of single mismatches (1 × 1 internal loops). Stability increments for the mismatch motif are shown, where the stabilities of the closing helices are subtracted from the duplex stability. The internal loops show a marked dependence on the distance from the helix end, where mismatches farther from helix ends are destabilizing and mismatches closer to helix ends are less destabilizing or stabilizing for helix formation.
Figure 7
Figure 7
Stability increments for tandem mismatches (2 × 2 internal loops). The tandem Z–Z and tandem P–P mismatches (blue) are compared to tandem pyrimidine and tandem purine mismatches (orange), respectively.
Figure 8
Figure 8
NED is significantly improved with the incorporation of P–Z pairs in designs (P = 2.7 × 10–13). This is a plot of NED for Eterna 100 designs using DNA and P–Z pairs as a function of designs using canonical DNA only. The NED of the best of five calculations is shown. Each point is a single example problem from the Eterna 100 set. Points below the diagonal line (plotted as a visual guide) are cases where incorporation of P–Z pairs improved the designs.
Figure 9
Figure 9
Incorporation of P–Z pairs improves the design of “Iron Cross”, problem 35 from the Eterna 100 set. Panel A shows the best design using canonical DNA only. Panel B shows the design incorporating P–Z into DNA. Bases are color-annotated with their probability of forming the correct structure, either the probability of folding into the specified target base pair or the probability of being unpaired in the target structure. The nucleotides in the P–Z containing sequence all form the desired structure with ≥95% probability. The structure composed of canonical DNA has a substantial number of nucleotides that are estimated to fold with <50% probability to the target structure.
Figure 10
Figure 10
Nearest neighbor free energy parameters for stacks containing natural DNA, P–Z pairs, and the stable G–Z pair.,

Update of

Similar articles

Cited by

References

    1. Doudna J. A.; Cech T. R. The chemical repertoire of natural ribozymes. Nature 2002, 418 (6894), 222–228. 10.1038/418222a. - DOI - PubMed
    1. Serganov A.; Nudler E. A decade of riboswitches. Cell 2013, 152 (1–2), 17–24. 10.1016/j.cell.2012.12.024. - DOI - PMC - PubMed
    1. Morais P.; Adachi H.; Yu Y. T. Spliceosomal snRNA Epitranscriptomics. Front. Genet. 2021, 12, 652129.10.3389/fgene.2021.652129. - DOI - PMC - PubMed
    1. Ghildiyal M.; Zamore P. D. Small silencing RNAs: an expanding universe. Nat. Rev. Genet. 2009, 10 (2), 94–108. 10.1038/nrg2504. - DOI - PMC - PubMed
    1. Gold L. SELEX: How It Happened and Where It will Go. J. Mol. Evol. 2015, 81 (5–6), 140–3. 10.1007/s00239-015-9705-9. - DOI - PMC - PubMed

Publication types