De-Bruijn graph with MapReduce framework towards metagenomic data classification

Md. Sarwar Kamal,Sazia Parvin,Amira S. Ashour,Fuqian Shi,Nilanjan Dey
DOI: https://doi.org/10.1007/s41870-017-0005-z
2017-02-23
International Journal of Information Technology
Abstract:AbstractMetagenomic gene classifications are significant in bioinformatics and computational biology research. There are huge interrelated datasets that deal with human characteristics, diseases and molecular functionalities. Analysis of metagenomic reorganization is a challenging issue due to their diversity and efficient classification tools. Graph based MapReducing approach can easily handle the genomic diversity. MapReduce has two parts such as mapping and reducing. In mapping phase, a recursive naive algorithm is used for generating K-mers. De-Bruijn graph is a compact representation of k-mers and finds out an optimal path (solution) for genome assembly. Similarity metrics have been utilized for finding similarity among the De-Oxy Ribonucleic Acid (DNA) sequences. In reducing side, Jaccard similarity and purity of clustering are used as datasets classifier to classify the sequences based on their similarity. Reducing phase can easily classify the DNA sequences from large database. Extensive experimental analysis has demonstrated that graph based MapReduce analysis generate optimal solutions. Remarkable improvements in time and space have recorded and observed. The results established that proposed framework performed faster than SSMA-SFSD when classified elements are increased. It provided better accuracy for metagenomic data clustering.
What problem does this paper attempt to address?