Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 13;64(9):3826-3840.
doi: 10.1021/acs.jcim.4c00234. Epub 2024 May 2.

Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening

Affiliations

Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening

Rohan Chandraghatgi et al. J Chem Inf Model. .

Abstract

Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Proposed integrated FDSL-DD and optimization method for computational drug design for a specific protein target begins with a library of ligands, prescreened by docking them with Autodock VINA, computationally fragmented, and are assigned binding affinities and other attributes of the parent ligands computed in the prescreening process, as previously described. The fragments are recombined using an evolutionary algorithm informed by parent ligand attributes, which results in the creation of both a diverse computational ligand population and one that has more optimal characteristics. The ligands output from the first optimization stage are then refined through an iterative optimization that also utilizes parent ligand attributes as well as information about fragment position in the binding pocket, to ultimately generate a population of novel candidate ligands for further evaluation and validation. (The components connected with gray lines are fully described in our previous work, while the components connected with solid lines are the subject of this paper.).
Figure 2
Figure 2
In the first phase of ligand optimization, a genetic algorithm is used to create ligands from the fragments produced by the prescreening and fragmentation pipeline shown in Figure 1 from Wilson et al. Fragments are represented like genes and assigned a weighted rank to determine the selection probability. Initially, fragments are randomly chosen and evaluated using Autodock VINA (see “Autodock Vina” block) and optionally analyzed for drug likeness via QED scores. Subsequent ligand generations are crafted using mutation (“Mutation” block), crossover, and elitism strategies, abiding by specific molecular weight and fragment use rules. The ligands are further cleaned to ensure that all fragments are utilized in the resultant ligand (“clean ligands” block) and constructed into new generations using the BRICS.BUILD module in RDKit (“BRICS.Build generates ligands” block).
Figure 3
Figure 3
Iterative fragment addition stage, shown schematically here, can begin with any kind of starting ligand but in our method begins with candidate ligands synthesized through the previous genetic algorithm-based ligand synthesis phase (see Figure 2) and fragments obtained from the initial ligand prescreening and fragmentation pipeline. The optimization objective of this phase is the binding affinity score predicted by AutoDock VINA, alone or in a sum with the quantitative effectiveness of drug-likeness (QED) score, which evaluates the beneficial molecular properties for drug design. The methodology begins with premade starter ligands and an amino acid-associated fragment data set. Through successive iterations, each ligand is evaluated and possibly merged with a protein PDB file for further assessment by protein–ligand interaction profiler (PLIP). Fragments are strategically added to target regions of the ligand, ensuring optimal binding affinity and maintaining molecular weights under 700 g/mol to ensure viable drug targets. This process cyclically refines ligand structures, using tools such as RDKit for optimization and 3D structuring, continuing to a prescribed iteration limit or until an optimizable ligand is generated.
Figure 4
Figure 4
Histograms comparing fragment pool generation methodologies. The top three graphs included comprehensive results of each iterative run. The “worst pool” trial used the worst 1000 fragments by VINA scores from the source fragment data set. The “large pool” included all fragments. The “unprioritized” and “prioritized” trials used the same subset of fragments generated with priority of the VINA score and the top 1000 fragments associated with a given amino acid. Unprioritized trials use randomly assigned fragments in the pool to bind to the ligands, while prioritized trials use subpools for each amino acid. For a given target amino acid, the prioritized trial suggested a fragment known to have interacted with that amino acid in the past based off the PLIP screening.
Figure 5
Figure 5
Comparison of AutoGrow4 and FDSL-DD iterative generated ligands. The proposed method’s histogram data was selected from prioritized trials described in Section 1.
Figure 6
Figure 6
Mean VINA scores of each iteration in the DeepFrag runs using default DeepFrag fragments (i.e., from the model training data) are plotted along with the unbiased standard error of the mean for each iteration as error bars. Scores tend to increase per iteration of DeepFrag, indicating worsened binding affinities. The best VINA scores for each protein target are −12.17, −11.47, and −9.895 kcal/mol for TIPE2, RelA, and Spike RBD, respectively. The graph is plotted such that the y-axis values decrease from bottom to top to show stronger binding affinities as higher values. The y-axis range differs for each target to illustrate the similarity in trend between the targets differently for each target.
Figure 7
Figure 7
Histograms comparing multiobjective and VINA prioritizations over final VINA and QED scores using iterative approach. Lower VINA scores indicate improved binding affinity, and higher QED scores indicate better drug-likeness. Red lines indicate 50th and 95th percentile scores, which are selected to segment the regions of each data set. Both iterative runs are on the same set of starting ligands from the genetic algorithm, including multiobjective prioritization. The starting ligand set is selected based on best multiobjective scores.
Figure 8
Figure 8
Focusing on the strongest ligand candidates obtained from multiobjective optimization, the histograms shown here plot the 95th percentile ligands by VINA score with QED score for each of the protein targets. The horizontal line indicates the 97.5th percentile VINA score. The vertical line indicates the 50th percentile QED score of the 95th percentile ligands by the VINA score.

Similar articles

References

    1. Ou-Yang S.-s.; Lu J.-y.; Kong X.-q.; Liang Z.-j.; Luo C.; Jiang H. Computational Drug Discovery. Acta Pharmacol. Sin. 2012, 33, 1131–1140. 10.1038/aps.2012.109. - DOI - PMC - PubMed
    1. Sliwoski G.; Kothiwale S.; Meiler J.; Lowe E. W. Computational Methods in Drug Discovery. Pharmacol. Rev. 2014, 66, 334–395. 10.1124/pr.112.007336. - DOI - PMC - PubMed
    1. Vieira T. F.; Sousa S. F. Comparing AutoDock and Vina in Ligand/Decoy Discrimination for Virtual Screening. Appl. Sci. 2019, 9, 4538.10.3390/app9214538. - DOI
    1. de Souza Neto L. R.; Moreira-Filho J. T.; Neves B. J.; Maidana R. L. B. R.; Guimarães A. C. R.; Furnham N.; Andrade C. H.; Silva F. P.. In Silico Strategies to Support Fragment-to-Lead Optimization in Drug Discovery. Front. Chem. 2020, 8, 93.10.3389/fchem.2020.00093. - DOI - PMC - PubMed
    1. Gupta R.; Srivastava D.; Sahu M.; Tiwari S.; Ambasta R. K.; Kumar P. Artificial Intelligence to Deep Learning: Machine Intelligence Approach for Drug Discovery. Mol. Diversity 2021, 25, 1315–1360. 10.1007/s11030-021-10217-3. - DOI - PMC - PubMed

Publication types