Modeling Data Lake Metadata with a Data Vault

Iuri Nogueira,Maram Romdhane,Jérôme Darmont
DOI: https://doi.org/10.48550/arXiv.1807.04035
2018-07-11
Databases
Abstract:With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.
What problem does this paper attempt to address?