Dna Sequence Splicing Algorithm Based on Spark

Xu Pan,Xue-liang Fu,Gai-fang Dong,Hong-hui Li
DOI: https://doi.org/10.1109/iciicii.2016.0024
2016-01-01
Abstract:Bioinformatics is a cross subject of biological information processing. DNA sequence splicing is one of its research content. At present, most parallel algorithms are based on the operating environment of MapReduce. There is a complex process for reading and writing to hard disk, which lead to inferiority that the speed of the algorithm will be slow. In this paper, Spark calculation model based on memory is proposed to solve the problem. At the same time, a new method of matching K-2 bit will be also used by us. Results of experiment show that the running environment based on Spark and the method can ensure accuracy of stitching results and make the algorithm more efficient.
What problem does this paper attempt to address?