Megale: A Metadata-Driven Graph-Based System for Data Lake Exploration
Doulkifli Boukraa,Meriem Bouraoui,Chaima Grine,Racha Ouahab
DOI: https://doi.org/10.1142/s0219622024500135
2024-12-18
International Journal of Information Technology & Decision Making
Abstract:International Journal of Information Technology &Decision Making, Ahead of Print. Data lakes are storage repositories that contain large amounts of data (big data) in its native format; encompassing structured, semi-structured or unstructured. Data lakes are open to a wide range of use cases, such as carrying out advanced analytics and extracting knowledge patterns. However, the sheer dumping of data into a data lake would only lead to a data swamp. To prevent such a situation, enterprises can adopt best practices, among which to manage data lake metadata. A growing body of research has focused on proposing metadata systems and models for data lakes with a special interest on model genericness. However, existing models fail to cover all aspects of a data lake, due to their static modeling approach. Besides, they do not fully cover essential features for an effective metadata management, namely governance, visibility and uniform treatment of data lake concepts. In this paper, we propose a dynamic modeling approach to meet these features, based on two main constructs: data lake concept and data lake relationship. We showcase our approach by Megale, a graph-based metadata system for NoSQL data lake exploration. We present a proof-of-concept implementation of Megale and we show its effectiveness and efficiency in exploring the data lake.
computer science, information systems, artificial intelligence, interdisciplinary applications,operations research & management science