Streamlining the RosettaScripts Interface: Integrated Logic Selection and LogicSelector
~ contributed by Dr. Jared Adolf-Bryfogle and Frances Chu
Integrated Logic Selection
By Dr. Jared Adolf-Bryfogle, Principal Scientist, Protein Design Lab, Institute for Protein Innovation
RosettaScripts has progressed to become one of the main interfaces to the Rosetta Software Suite. Within RosettaScripts, there are a number of different sections that enable customized protocols. Two of the newest components are ResidueSelectors and SimpleMetrics. ResidueSelectors allow one to select a group of residues based on some criteria, while SimpleMetrics allow one to output a variety of metrics for each molecule under consideration, with many of them being per-residue metrics. I introduced SimpleMetrics to make benchmarking and analysis tasks easier.
For simple tasks, a single ResidueSelector is usually sufficient, but for more real-world applications, one usually would need to define large logic strings to get at the residues of interest. This would be accomplished by individual simple metrics with interspersed NOT and AND selectors to create logic.
This process led to scripts where a large chunk of the ResidueSelector section of a RosettaScript was just Or/Not/And selectors.
This was tolerable, but as you could imagine, this was prone to break during modification and very hard to read, document, and expand.
There must be a better way!
I set out to make this less cumbersome through some way to integrate this complex logic in any place that accepts a residue selector. During this time, a colleague, Dr. Ajasja Ljubetič pointed out core code written a number of years ago by one of our main senior developers and the architect behind Rosetta3, Andrew-Leaver-Fay that could be used instead of writing my own parsing functionality. This was special logic for selection within a TaskOperation called DesignRestrictions, which is typically used for layer design of core, surface, and boundary residue positions.
By using and modifying this core code during parse time in collaboration with Andrew and Ajasja, RosettaScripts now has this integrated logic for residue selection throughout. Instead of a large block of specialized logic lines, this can now be done directly within any component that accepts selectors, for example:
<SelectedResiduesPyMOLMetric name="redesign_res" residue_selector="cys or met or trp or (entire_sheet AND real AND chain_1) or (entire_helix AND real AND chain_1)"/>
During this refactoring, there are a few other additions that make working with logic and analysis easier:
Logic no longer needs to be in all caps
The word not can now be used instead of !
Logic works for lists of selectors when movers or other components take lists
SelectedResiduesPyMOLMetric (which outputs pymol selections of selectors) uses the custom_type keyword in naming the selection
This functionality has been useful in our lab and I hope it is useful to you as well!
After my mentor, Dr. Jared Adolf-Bryfogle, made it possible to include logic at any component that accepts selectors, there was only one piece of functionality missing. In order to assign a name to a piece of complex logic, one still needed to use either the And, Or, or Not residue selectors.
<Or name="my_selection" selectors="selector_1 AND selector_2,NOT selector_3"/>
To solve this, as my first Rosetta code project, I wrote the Logic Selector which does not do any logic itself, but allows all logic to be done through the "selector" option.
<Logic name="my_selection" selector="(selector_1 AND selector_2) OR NOT selector_3"/>
You may notice that apart from the "name" option, the logic selector takes the following options: "selector", "selectors", "residue_selector", and "residue_selectors". These four options are all synonyms, and only one can be taken at any time. Why is this the case?
In discussion of what the selector-accepting option should be named during review of my first github Pull Request, Dr. Ajasja Ljubetič and Dr. Vikram Mulligan both brought up valid points. On one hand, "selectors" makes sense from a grammatical point of view, since the option is taking multiple residue selectors separated by booleans. On the other hand, "selector" is the option that all residue selectors (And, Or, Not) have taken in the past. Additionally, Vikram pointed out that there may be more kinds of selectors in the future; thus, having a more explicit "residue_selector" option could be useful.
I ultimately decided to have all four option names available to the user, so you have the opportunity to decide for yourself. Keep in mind that no matter which version you use, they all perform the same task and all take residue selectors.
I hope that as you discover the great functionality of integrated logic selection, this selector comes in handy as well!
(Stay tuned for the next blog on this topic: SimpleMetrics in protocols By Dr. Rocco Moretti!)