Abstract:The multisubunit protein assemblies that play critical roles in biology are the result of evolutionary selection for function of the entire assembly, and hence the subunits in structures such as icosahedral viral capsids often fit together with remarkable shape complementarity1,2. In contrast, the large multisubunit assemblies that have been created by de novo protein design, notably the icosahedral nanocages used in a new generation of potent vaccines3-7, have been built by first designing symmetric oligomers with cyclic symmetry and then assembling these into nanocages while keeping the internal structure fixed8-14, which results in more porous structures with less extensive shape matching between the components. Such hierarchical "bottom-up" design approaches have the advantage that one interface can be designed and validated in the context of the cyclic oligomer building block15,16, but the disadvantage that the structural and functional features of the assemblies are limited by the properties of the predesigned building blocks. To overcome this limitation, we set out to develop a "top-down" reinforcement learning based approach to protein nanomaterial design in which both the structures of the subunits and the interactions between them are built up coordinately in the context of the entire assembly. We developed a Monte Carlo tree search (MCTS) method17,18 which assembles protein monomer structures in the context of an overall architecture guided by a loss function which enables specification of any desired overall structural properties such as shape and porosity. We demonstrate the power of the approach by designing hyperstable icosahedral assemblies more compact than any previously observed protein icosahedral structure (designed or naturally occurring), that have very low porosity and are robust to fusion and display of proteins as complex as influenza hemagglutinin. CryoEM structures of two designs are very close to the computational design models. Our top-down reinforcement learning approach should enable the design of a wide variety of complex protein nanomaterials by direct optimization of overall system properties.

CATANA: an online modelling environment for proteins and nucleic acid nanostructures

small: A Programmatic Nanostructure Design and Modelling Environment

TacoxDNA: A user-friendly web server for simulations of complex DNA structures, from single strands to origami

Facilitating the structural characterisation of non-canonical amino acids in biomolecular NMR

NanoFrame: A web-based DNA wireframe design tool for 3D structures

CHIMERA_NA: A Customizable Mutagenesis Tool for Structural Manipulations in Nucleic Acids and Their Complexes

inSēquio: A Programmable 3D CAD Application for Designing DNA Nanostructures

Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design

Coarse-Grained Nucleic Acid-Protein Model for Hybrid Nanotechnology

Rapid prototyping of arbitrary 2D and 3D wireframe DNA origami

Modeling protein-small molecule conformational ensembles with ChemNet

Progress On Modeling Of Protein Structures And Interactions

A Suite of Designed Protein Cages Using Machine Learning Algorithms and Protein Fragment-Based Protocols

Top-down design of protein nanomaterials with reinforcement learning

Using Integrative Modeling Platform to compute, validate, and archive a model of a protein complex structure

3DStructGen: an interactive web-based 3D structure generation for non-periodic molecule and crystal

An Online Nanoinformatics Platform Empowering Computational Modeling of Nanomaterials by Nanostructure Annotations and Machine Learning Toolkits

AlphaFold 3 - Aided Design of DNA Motifs To Assemble into Triangles

A suite of designed protein cages using machine learning and protein fragment-based protocols

ANNaMo: Coarse-grained modelling for folding and assembly of RNA and DNA systems

Computational design of bifaceted protein nanomaterials with tailorable properties