A PTAS for Distinguishing (Sub)string Selection

Xiaotie Deng,Guojun Li,Zimao Li,Bin Ma,Lusheng Wang
DOI: https://doi.org/10.1007/3-540-45465-9_63
2002-01-01
Abstract:Consider two sets of strings, B (bad genes) and G (good genes), as well as two integers db and dg (db 驴 dg). A frequently occurring problem in computational biology (andother fields) is to finda (distinguishing) substring s of length L that distinguishes the bad strings from goodstrings, i.e., for each string si 驴 B there exists a length-L substring ti of si with d(s, ti) 驴 db (close to badstrings) andfor every substring ui of length L of every string gi 驴 G, d(s, ui) 驴 dg (far from goodstrings). We present a polynomial time approximation scheme to settle the problem, i.e., for any constant 驴 0, the algorithm finds a string s of length L such that for every si 驴 B, there is a length-L substring ti of si with d(ti, s) 驴 (1+驴)db and for every substring ui of length L of every gi 驴 G, d(ui, s) 驴 (1 - 驴)dg, if a solution to the original pair (db 驴 dg) exists.
What problem does this paper attempt to address?