Streamlining the RosettaScripts Interface: Integrated Logic Selection and LogicSelector

contributed by Dr. Jared Adolf-Bryfogle and Frances Chu


Integrated Logic Selection 

By  Dr. Jared Adolf-Bryfogle, Principal Scientist, Protein Design Lab, Institute for Protein  Innovation


RosettaScripts has progressed to become one of the main interfaces to the Rosetta Software Suite.  Within RosettaScripts, there are a number of different sections that enable customized protocols.  Two of the newest components are ResidueSelectors and SimpleMetrics.  ResidueSelectors allow one to select a group of residues based on some criteria, while SimpleMetrics allow one to output a variety of metrics for each molecule under consideration, with many of them being per-residue metrics. I introduced SimpleMetrics to make benchmarking and analysis tasks easier. 


For simple tasks, a single ResidueSelector is usually sufficient, but for more real-world applications, one usually would need to define large logic strings to get at the residues of interest.  This would be accomplished by individual simple metrics with interspersed NOT and AND selectors to create logic.  


This process led to scripts where a large chunk of the ResidueSelector section of a RosettaScript was just Or/Not/And selectors. 


This was tolerable, but as you could imagine, this was prone to break during modification and very hard to read, document, and expand. 


There must be a better way!

I set out to make this less cumbersome through some way to integrate this complex logic in any place that accepts a residue selector. During this time, a colleague, Dr. Ajasja Ljubetič  pointed out core code written a number of years ago by one of our main senior developers and the architect behind Rosetta3, Andrew-Leaver-Fay that could be used instead of writing my own parsing functionality.  This was special logic for selection within a TaskOperation called DesignRestrictions, which is typically used for layer design of core, surface, and boundary residue positions. 


By using and modifying this core code during parse time in collaboration with Andrew and Ajasja, RosettaScripts now has this integrated logic for residue selection throughout.  Instead of a large block of specialized logic lines, this can now be done directly within any component that accepts selectors, for example:


<SelectedResiduesPyMOLMetric name="redesign_res" residue_selector="cys or met or trp or (entire_sheet AND real AND chain_1) or (entire_helix AND real AND chain_1)"/>



During this refactoring, there are a few other additions that make working with logic and analysis easier:

  • Logic no longer needs to be in all caps

  • The word not can now be used instead of !

  • Logic works for lists of selectors when movers or other components take lists

  • SelectedResiduesPyMOLMetric (which outputs pymol selections of selectors) uses the custom_type keyword in naming the selection


This functionality has been useful in our lab and I hope it is useful to you as well!  



LogicSelector

By Frances Chu, 2020 Summer Rosetta Internship, Protein Design Lab, Institute for Protein Innovation

After my mentor, Dr. Jared Adolf-Bryfogle, made it possible to include logic at any component that accepts selectors, there was only one piece of functionality missing. In order to assign a name to a piece of complex logic, one still needed to use either the And, Or, or Not residue selectors. 


<Or name="my_selection" selectors="selector_1 AND selector_2,NOT selector_3"/>


To solve this, as my first Rosetta code project, I wrote the Logic Selector which does not do any logic itself, but allows all logic to be done through the "selector" option. 


<Logic name="my_selection" selector="(selector_1 AND selector_2) OR NOT selector_3"/>


You may notice that apart from the "name" option, the logic selector takes the following options: "selector", "selectors", "residue_selector", and "residue_selectors". These four options are all synonyms, and only one can be taken at any time. Why is this the case?


In discussion of what the selector-accepting option should be named during review of my first github Pull Request, Dr. Ajasja Ljubetič and Dr. Vikram Mulligan both brought up valid points. On one hand, "selectors" makes sense from a grammatical point of view, since the option is taking multiple residue selectors separated by booleans. On the other hand, "selector" is the option that all residue selectors (And, Or, Not) have taken in the past. Additionally, Vikram pointed out that there may be more kinds of selectors in the future; thus, having a more explicit "residue_selector" option could be useful.


I ultimately decided to have all four option names available to the user, so you have the opportunity to decide for yourself. Keep in mind that no matter which version you use, they all perform the same task and all take residue selectors. 


I hope that as you discover the great functionality of integrated logic selection, this selector comes in handy as well!


(Stay tuned for the next blog on this topic: SimpleMetrics in protocols By Dr. Rocco Moretti!)

Popular posts from this blog

Future of Rosetta Task Force Update

De novo proteins and where to find them - RosettaCon 2022

Proposal for the Future of Rosetta