Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse

Soukaina Ait Errami,Hicham Hajji,Kenza Ait El Kadi,Hassan Badir
DOI: https://doi.org/10.1016/j.jpdc.2023.02.007
IF: 4.542
2023-02-19
Journal of Parallel and Distributed Computing
Abstract:Building systems supporting location-related data presented an opportunity to gain rich insights, given this data type growth. For a long time, the spatial extension of Data Warehouses helped in the integration of the spatial dimension into the decision-making process. With the Big Data rise, spatial data began to proliferate. However, these Spatial Data Warehouses showed several limitations such as inability to support IoT data streams and scalability issues. The Data Lake was introduced as a paradigm trying to solve these issues. However, it also shows some inconsistencies that were the reason behind the emergence of a new generation of data management systems: The Data LakeHouse. This new data architecture is a combination of governed and reliable Data Warehouses and flexible, scalable and cost-effective Data Lakes. We present a literature overview of these transitions, and their causes and address the Spatial Big Data requirements within the Data LakeHouse. We address this aspect by presenting the different components and best practices for building a Data LakeHouse architecture optimized for the storage and computing of Spatial Big Data as well as the successive components of the Spatial Data LakeHouse architecture.
computer science, theory & methods
What problem does this paper attempt to address?