The eternal debate – one score function to rule them all?
Molecular modeling packages use various types of score functions, depending on the task and their required atomistic level descriptions. For instance, molecular dynamics simulation packages often use a molecular mechanics force field to model small molecules, peptides and small proteins, and allow exploring the folding trajectory of the latter two. The modeling of chemical reactions requires a description of the electron configurations through quantum-mechanical force fields, which are expensive to compute and only possible for small systems. A combined QM/MM (quantum mechanics / molecular mechanics) approach allows modeling chemical reactions in larger systems such as enzymes. Monte-Carlo simulation packages often rely (at least partly) on knowledge-based score functions that are statistically derived from experimentally validated datasets through an inverse Boltzmann relationship (see Figure).
Figure: Most prevalent score terms in the Rosetta score function. Figure adapted from Koehler Leman et al., Nature Methods, 2020.
These score functions rely on specific assumptions and are often parameterized based on various experimental observations. The observations differ by task, which is why not all of the modeling approaches work equally well for all systems. For instance, the force fields in MD simulation packages are parameterized on small molecules, proteins or peptides and require expertise in their setup. Score functions in Monte-Carlo packages like Rosetta are often parameterized on proteins of various sizes. Force fields and score functions are also parameterized depending on the level of granularity. For instance, low-resolution or coarse-grained score functions merge the description of several atoms together to improve speed. High-resolution score functions describe all atoms separately at the expense of computation time.
The latest default score function in Rosetta, REF2015 (short for Rosetta Energy Function 2015), has been parameterized using both structural knowledge as well as thermodynamic observables. This was accomplished using a derivative-free Nelder-Mead simplex optimization approach (Park et al., JCTC, 2016), the result of which is a score function that is scaled in kcal / mol. Much work has also gone into generalizing the REF2015 score function to enable evaluation of conformational energies of non-canonical protein and protein-like heteropolymers (Bhardwaj et al., Nature, 2016). Our and other communities are currently testing the performance of score functions based on deep learning approaches. Speed would be an obvious advantage but accuracy and reliability for different systems are open questions.
The fundamental question is whether it is possible and/or realistic to use a single score function for all types of applications? Rosetta has major score functions (Alford et al., JCTC, 2017), like REF2015, but as mentioned above, many applications have special score functions that were developed and refined over many years. While nature uses a single energy function in which physical systems live and chemical reactions are performed, even Mother Nature occasionally makes mistakes, misfolds proteins or doesn't find the correct binding interface in every single case, perhaps by design to allow for evolution in the first place. Maybe having several, optimized score functions for different tasks is more effective in predicting the 'correct' answer more often? The multi-dimensionality of the search space makes it impossible to always predict the correct solution and a single score function seems to perform worse for all tasks. Diversity in benchmark datasets and inaccuracies in score functions result in the existence of outliers, which need to be accounted for, ever driving the development of more accurate score functions.