KSI:a DNA sequence matching library for terabyte scale bio-data

Xiquan Zhao,Xu Li,Huiwei Lv,Guangming Tan
DOI: https://doi.org/10.3772/j.issn.1002-0470.2015.12.001
2015-01-01
Abstract:It was paid attention that current mainstream softwares for DNA sequence analysis perform much repetitive work because they mostly implement a set of functions for sequence storage and query for their own use, and their design ignores the requirements of parallelism, scalability and distributed environment, while the volume of DNA data is increasing rapidly.To meet the needs for analysis of different species’ DNA sequences, and adapt to DNA data’s rapid increase, a DNA sequence matching library for terabyte scale bio-data, called the k-mer searching in-terface ( KSI) , was designed and implemented based on k-mer matching, the basic operation for DNA sequence processing.KSI provides a set of application programming interfaces ( APIs) under distributed computing environ-ments, and optimizes the DNA sequence matching in the biological computing field.The experimental results show that KSI is an efficient and scalable solution for big bio-data processing.
What problem does this paper attempt to address?