BioSeg: a biological sequence data model

ZHU Yangyong,XIONG Yun
2008-01-01
Abstract:The appropriate storage manner of biological sequence data is critical for accessing and dealing with them efficiently. Existing database management system cannot efficiently support biological sequence data type and its operations, people have to use text data type in database management system or text file directly. This state makes the low efficiency when biological sequence data are processed. The features of biological sequence data are investigated, the query demands are analyzed and induced, and then a novel biological sequence data model named BioSeg is presented. The model is composed of descripition and multi-dimensional array. The part of description represents annotations and other related information about biological sequence data and multi-dimensional array stores concrete sequence (for example, a DNA sequence "ATCCCGA"). Algebra operations on BioSeg which can implement query on biological sequence data. Query capability on BioSeg is more efficient and feasible than previous storage manner using text type.
What problem does this paper attempt to address?