A novel approach for large‐scale environmental data partitioning on cloud and on‐premises storage for compute continuum applications
Gennaro Mellone,Ciro Giuseppe De Vita,Dante D. Sánchez‐Gallegos,Genaro Sánchez‐Gallegos,Catherine A. Torres‐Charles,Javier Garcia‐Blas,Jesús Carretero Pérez,J. L. Gonzalez‐Compean,Giuliano Laccetti
DOI: https://doi.org/10.1002/cpe.7893
2023-08-24
Concurrency and Computation: Practice and Experience
Abstract:Summary Cloud‐based services have proved useful in several research fields, such as engineering, health science, and astrophysics, to mention a few examples. The computational environmental science community developed a strong need for cloud facilities to store, process, and manage data from observations and numerical models for simulations and forecasts. Weather forecast models and global sensor networks deal with multidimensional geo‐referenced data∖sets. However, environmental data consumer applications usually require a relatively small amount of multidimensional input data slice to analyze a specific area or time interval. Hence, reducing data dimension for information retrieval is mandatory. This paper presents a twofold solution: a technique to load and retrieve the sliced multidimensional data set on different cloud services such as Amazon Web Service (AWS), Google Cloud Platform, and Microsoft Azure. The experimental results performed on these cloud services highlight that the proposed method can significantly speed up the process of loading and retrieving the data slices compared to working with the entire data set in bulk or OPeNDAP server.
computer science, theory & methods, software engineering