On-the-Fly Fusion of Remotely-Sensed Big Data Using an Elastic Computing Paradigm with a Containerized Spark Engine on Kubernetes

Wei Huang,Jianzhong Zhou,Dongying Zhang
DOI: https://doi.org/10.3390/s21092971
IF: 3.9
2021-04-23
Sensors
Abstract:Remotely-sensed satellite image fusion is indispensable for the generation of long-term gap-free Earth observation data. While cloud computing (CC) provides the big picture for RS big data (RSBD), the fundamental question of the efficient fusion of RSBD on CC platforms has not yet been settled. To this end, we propose a lightweight cloud-native framework for the elastic processing of RSBD in this study. With the scaling mechanisms provided by both the Infrastructure as a Service (IaaS) and Platform as a Services (PaaS) of CC, the Spark-on-Kubernetes operator model running in the framework can enhance the efficiency of Spark-based algorithms without considering bottlenecks such as task latency caused by an unbalanced workload, and can ease the burden to tune the performance parameters for their parallel algorithms. Internally, we propose a task scheduling mechanism (TSM) to dynamically change the Spark executor pods’ affinities to the computing hosts. The TSM learns the workload of a computing host. Learning from the ratio between the number of completed and failed tasks on a computing host, the TSM dispatches Spark executor pods to newer and less-overwhelmed computing hosts. In order to illustrate the advantage, we implement a parallel enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to enable the efficient fusion of big RS images with a Spark aggregation function. We construct an OpenStack cloud computing environment to test the usability of the framework. According to the experiments, TSM can improve the performance of the PESTARFM using only PaaS scaling to about 11.7%. When using both the IaaS and PaaS scaling, the maximum performance gain with the TSM can be even greater than 13.6%. The fusion of such big Sentinel and PlanetScope images requires less than 4 min in the experimental environment.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently fuse remote sensing big data (RSBD) on cloud computing platforms. Although cloud computing (CC) provides a macroscopic perspective for processing large - scale remote sensing data, the problem of how to efficiently fuse remote sensing big data on cloud computing platforms has not been fully resolved. For this reason, the author proposes a lightweight cloud - native framework for elastically processing remote sensing big data. By utilizing the expansion mechanisms provided by infrastructure - as - a - service (IaaS) and platform - as - a - service (PaaS), the running model of the container - based Spark engine on Kubernetes can improve the efficiency of Spark - based algorithms, reduce task delays caused by workload imbalance, and simplify the tuning process of parallel algorithm performance parameters. Specifically, a task - scheduling mechanism (TSM) is proposed in the paper. This mechanism can dynamically change the affinity between Spark executor Pods and computing hosts, thereby intelligently allocating tasks according to the workload situation of the computing hosts. In addition, the paper also implements a parallel - enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to improve the efficiency of large - scale remote sensing image fusion. Experimental results show that when using the PaaS - layer expansion method, TSM can improve the performance of PESTARFM by about 11.7%; when using the expansion methods of both IaaS and PaaS layers simultaneously, the maximum performance gain can reach more than 13.6%. This framework can shorten the fusion time of large Sentinel and PlanetScope images to within 4 minutes in the experimental environment.