Persistent spectral theory-guided protein engineering

Yuchi Qiu,Guo-Wei Wei
DOI: https://doi.org/10.1038/s43588-022-00394-y
2023-02-22
Nature Computational Science
Abstract:Although protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during filtration of given data. Here we introduce a T opology- o ffered P rotein Fit ness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, which is a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.
What problem does this paper attempt to address?