Distributed Short Read Mapping System

Zeng Jin,Zhang Jiaqi,He Zhenying
2011-01-01
Journal of Computer Research and Development
Abstract:CloudBurst,which aims to map the enormous amount of sequence data generated by next-generation DNA sequencing machines,is one of the parallel read-mapping algorithms using the open-source Hadoop implementation of MapReduce.However,limited by the Hadoop framework,CloudBurst probably leads to unbalanced load in some cases.Also,it is not so efficient because it cannot do the Reduce work parallelly before Map finished.In order to deal with these problems above,a new distributed short read mapping system,called D-RMAP,is developed.
What problem does this paper attempt to address?