BIOINDEX:AN EFFICIENT INDEX FOR SIMILARITY QUERIES OF BIOLOGICAL SEQUENCES

Qiu Boren,Xiong Yun,Zhu Yangyong
DOI: https://doi.org/10.3969/j.issn.1000-386X.2009.10.001
2009-01-01
Abstract:Managing biological data effectively and providing efficient query methods are the important researches in biological information processing.BioSeg is a novel biological sequence data model.The study of query optimization is an important part of the biological database management system development.This paper studies the current biological data index technology.Making use of the features of BioSeg data model,in this paper it designs a novel index of biological sequence data,BioIndex,to meet the demand of biological sequence similarity queries,and proposes a corresponding query algorithm.Firstly,sequence patterns of biological sequence set are mined with MEME,and these sequence patterns are used as index to construct sequence index database.Then in the index sequence database the algorithm finds the index sequence which has the highest sequence similarity with the query sequence.Its corresponding sequences in the biological database are set as the candidate sequences.Finally,the sequences with the highest similarity are found in candidate sequences.The experiment results using real biological sequence data show that the query algorithm based on the new biological sequence index BioIndex improves query efficiency.
What problem does this paper attempt to address?