A seriate coverage filtration approach for homology search.

Hsiao Ping Lee,Yin-Te Tsai,Chuan Yi Tang
DOI: https://doi.org/10.1145/967900.967937
2004-01-01
Abstract:The homology search within genomic databases is a fundamental and crucial work in biological knowledge discovery. With exponentially increasing size and access of databases, the issues of efficient retrieval become more essential in bioinformatics. Due to the varieties of biological data, similar sequences are not only under some error tolerance, but are also above some seriate coverage level. In this paper, we propose a seriate coverage filtration approach to extract the homologies from the databases efficiently. Our approach performs a lossless filtration and can be implemented as a preprocess of the existing search heuristics. Our method converts a user's requests for error and seriate coverage levels to some thresholds of interest. Accordingly, we transform the work of homology discovery to a variation of the longest increasing subsequence problem, and design an efficient counterpart algorithm. In the performance test, it is found that our approach has an attractive quality of filtration.
What problem does this paper attempt to address?