Abstract:In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$1\over 2$\end{document}.

The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Global-Context Aware Generative Protein Design

Protein sequence design by conformational landscape optimization

Folding Lattice HP Model of Proteins Using the Bond-Fluctuation Model

Constrained Pairwise and Center-Star Sequences Alignment Problems

Conformation Studies of Two-Dimensional Model Molecules of Proteins in the Process of Folding

A critical analysis of computational protein design with sparse residue interaction graphs

Protein sequence design by explicit energy landscape optimization

A Deterministic Optimization Approach to Protein Sequence Design Using Continuous Models

Protein Design by Integrating Machine Learning with Quantum Annealing and Quantum-inspired Optimization

An Efficient Algorithm for Computational Protein Design Problem

Using quantum annealing to design lattice proteins

Designability, Thermodynamic Stability, And Dynamics In Protein Folding: A Lattice Model Study

An Exact Algorithm for Side-Chain Placement in Protein Design

Protein Design Using Physics Informed Neural Networks

Sequence Design and Folding Dynamics of Lattice Protein-Like Models

Lattice protein design using Bayesian learning

The Designability of Protein Structures: A Lattice-Model Study using the Miyazawa-Jernigan Matrix

The cavity method to protein design problem

Improved hybrid optimization algorithm for 3D protein structure prediction

Computational Protein Design Using AND/OR Branch-and-Bound Search