The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Piotr Berman,Bhaskar DasGupta,Dhruv Mubayi,Robert Sloan,György Turán,Yi Zhang
DOI: https://doi.org/10.1007/978-3-540-27801-6_18
2004-01-01
Abstract:In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$1\over 2$\end{document}.
What problem does this paper attempt to address?