Big Data Quality: A Survey

M. Serhani,R. Dssouli,Ikbal Taleb
DOI: https://doi.org/10.1109/BigDataCongress.2018.00029
2018-07-01
Abstract:With the advances in communication technologies and the high amount of data generated, collected, and stored, it becomes crucial to manage the quality of this data deluge in an efficient and cost-effective way. The storage, processing, privacy and analytics are the main keys challenging aspects of Big Data that require quality evaluation and monitoring. Quality has been recognized by the Big Data community as an essential facet of its maturity. Yet, it is a crucial practice that should be implemented at the earlier stages of its lifecycle and progressively applied across the other key processes. The earlier we incorporate quality the full benefit we can get from insights. In this paper, we first identify the key challenges that necessitates quality evaluation. We then survey, classify and discuss the most recent work on Big Data management. Consequently, we propose an across-the-board quality management framework describing the key quality evaluation practices to be conducted through the different Big Data stages. The framework can be used to leverage the quality management and to provide a roadmap for Data scientists to better understand quality practices and highlight the importance of managing the quality. We finally, conclude the paper and point to some future research directions on quality of Big Data.
Computer Science
What problem does this paper attempt to address?