Practical Example of Enzyme Activity Prediction Using Structure Prediction and MD Simulation

2024.12.02

Introduction

I am Isosaki from the Business Development Department. At our company, we are conducting enzyme activity prediction as part of our useful enzyme exploration efforts using molecular dynamics (MD) simulations. From unknown enzyme sequences, we predict their structures and subject the enzyme-ligand complexes to MD simulations. Based on the results, we calculate a proprietary digzyme score to predict enzyme activity. In this blog, we will present an example of predicting the activity of a thiolase-like enzyme, OleA, from its homologous sequences.

Materials Used for Enzyme Activity Prediction

The natural substrate for OleA is acyl-CoA. The enzyme's Cys143 residue cleaves this acyl group. In investigating this activity, a p-nitrophenolate-based experimental system is used (Figure 1).

**Figure 1.** The reaction catalyzed by OleA. (Excerpt from Robinson et al.)

Results

We predicted whether 59 homologous OleA sequences would hydrolyze one type of p-nitrophenolate, specifically 4-nitrophenyl-hexanoate.

1. Prediction of the 3D Structure of 59 Homologous Sequences

First, as all 59 homologous sequences had unknown structures, we predicted their 3D structures. From the predicted structures, we also identified the location of the active residues and the substrate-binding pocket. Figure 2 shows the predicted 3D structure of the homologous sequences and the location of the active residue, Cys. Figure 3 shows the predicted location of the substrate-binding pocket.

TechBlog1-2 — **Figure 2.** Predicted 3D structure based on sequence information and active residues. The circled Cys represents the predicted active residue.

TechBlog1-3 — **Figure 3.** Predicted substrate-binding pocket. The pocket is shown as a space represented by red dots, while the corresponding enzyme surface is depicted in the purple area.

2. Molecular Dynamics Simulation

Next, we placed the enzyme-substrate complex, consisting of the enzyme and 4-nitrophenyl-hexanoate, in a system containing water molecules and ions, and ran molecular dynamics simulations (Figure 4).

TechBlog1-4 — **Figure 4.** The enzyme-substrate complex placed in water molecules and ions for molecular dynamics simulation.

3. Calculation of digzyme's Proprietary Enzyme Activity Prediction Score

Finally, based on the results of the molecular dynamics simulations, we calculate digzyme's proprietary score. Figure 5 shows the predicted scores for all 59 sequences, sorted in descending order. Sequences with confirmed activity from experimental validation are highlighted in pink, while those without activity are shown in gray. Sequences with a score of 70 or higher were classified as active (above the red line in Figure 5). In this case, we predicted 9 sequences to be active, 3 of which were experimentally confirmed to have activity, resulting in a positive predictive value (PPV) of 0.30. The true positive rate (TPR) was 0.6, and the false positive rate (FPR) was 0.13. This indicates that inactive sequences were successfully ranked lower, while active sequences were included in the higher ranks.

**Figure 5.** Ranking of 59 homologous sequences based on enzyme activity scores. The 9 sequences above the red line (score of 70 or higher) were predicted to have activity. The 5 sequences in pink in the bar graph were experimentally confirmed to have activity.

Conclusion

In this blog, we demonstrated the prediction of enzyme activity using our enzyme activity prediction technology, and validated the predictions with experimental results. Typically, 5 to 10 sequences are synthesized for experimental validation. In this case, two of the top five ranked sequences were confirmed to have activity through experiments, demonstrating the practical accuracy of our enzyme activity prediction method. The dataset was specifically selected to simulate a scenario where only a small fraction of the enzyme sequences in the population exhibit activity (in this case, 5 out of 59 sequences). Since the false negative rate was kept low, we successfully predicted enzyme activity with high accuracy.

Acknowledgments

We would like to thank the following paper for providing the experimental data used in this enzyme activity prediction:

Robinson et al., (2020) Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily. Synthetic Biology.

Practical Example of Enzyme Activity Prediction Using Structure Prediction and MD Simulation

Tech Blog

Food Development Expo 2025｜CTO Presentation Highlights & Q&A from Our Exhibition Booth

Exploring the Treasure Trove of Underutilized Sequence Data: Hikoyu Suzuki on Unlocking Value Through Analysis (Employee Interview)

Coordinating Research and Social Implementation: Isozaki on Tackling Challenges in the Field (Employee Interview)