Genetic Design of Drugs Without Side-Effects.

X Deng,G Li,Z Li,B Ma,LS Wang
DOI: https://doi.org/10.1137/s0097539701397825
2003-01-01
SIAM Journal on Computing
Abstract:Consider two sets of strings, B ( bad genes) and G ( good genes), as well as two integers d(b) and d(g) (d(b) less than or equal to d(g)). A frequently occurring problem in computational biology (and other fields) is to find a (distinguishing) substring s of length L that distinguishes the bad strings from good strings, i.e., such that for each string s(i) is an element of B there exists a length-L substring t(i) of s(i) with d(s, t(i)) less than or equal to d(b) (close to bad strings), and for every substring u(i) of length L of every string g(i) is an element of G, d(s, u(i)) greater than or equal to d(g) (far from good strings).We present a polynomial time approximation scheme to settle the problem; i.e., for any constant epsilon > 0, the algorithm finds a string s of length L such that for every s(i) is an element of B there is a length-L substring t(i) of s(i) with d(t(i), s) less than or equal to (1 + epsilon)d(b), and for every substring u(i) of length L of every g(i) is an element of G, d(u(i), s) greater than or equal to (1 - epsilon) d(g) if a solution to the original pair (d(b) less than or equal to d(g)) exists. Since there is a polynomial number of such pairs (d(b), d(g)), we can exhaust all the possibilities in polynomial time to find a good approximation required by the corresponding application problems.
What problem does this paper attempt to address?