Distributed data management using MapReduce

Feng Li,Beng Chin Ooi,M. Tamer Özsu,Sai Wu
DOI: https://doi.org/10.1145/2503009
2014-01-01
Abstract:MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.
What problem does this paper attempt to address?