Scheduling-Guided Automatic Processing of Massive Hyperspectral Image Classification on Cloud Computing Architectures

Zebin Wu,Jin Sun,Yi Zhang,Yaoqin Zhu,Jun Li,Antonio Plaza,Jón Atli Benediktsson,Zhihui Wei,Jon Atli Benediktsson
DOI: https://doi.org/10.1109/tcyb.2020.3026673
IF: 11.8
2021-07-01
IEEE Transactions on Cybernetics
Abstract:The large data volume and high algorithm complexity of hyperspectral image (HSI) problems have posed big challenges for efficient classification of massive HSI data repositories. Recently, cloud computing architectures have become more relevant to address the big computational challenges introduced in the HSI field. This article proposes an acceleration method for HSI classification that relies on scheduling metaheuristics to automatically and optimally distribute the workload of HSI applications across multiple computing resources on a cloud platform. By analyzing the procedure of a representative classification method, we first develop its distributed and parallel implementation based on the MapReduce mechanism on Apache Spark. The subtasks of the processing flow that can be processed in a distributed way are identified as divisible tasks. The optimal execution of this application on Spark is further formulated as a divisible scheduling framework that takes into account both task execution precedences and task divisibility when allocating the divisible and indivisible subtasks onto computing nodes. The formulated scheduling framework is an optimization procedure that searches for optimized task assignments and partition counts for divisible tasks. Two metaheuristic algorithms are developed to solve this divisible scheduling problem. The scheduling results provide an optimized solution to the automatic processing of HSI big data on clouds, improving the computational efficiency of HSI classification by exploring the parallelism during the parallel processing flow. Experimental results demonstrate that our scheduling-guided approach achieves remarkable speedups by facilitating the automatic processing of HSI classification on Spark, and is scalable to the increasing HSI data volume.
automation & control systems,computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the big - data - processing challenges in hyperspectral image (HSI) classification. Specifically, due to the large amount of hyperspectral image data and high algorithm complexity, this brings huge challenges to efficient classification. To solve this problem, the author proposes an acceleration method based on the scheduling meta - heuristic algorithm for automatically and optimally allocating the workload of hyperspectral image applications on the cloud - computing architecture. By analyzing the processes of representative classification methods, first, its distributed and parallel implementation is developed on Apache Spark based on the MapReduce mechanism. Further, the optimal execution of this application on Spark is formalized as a divisible - scheduling framework, which takes into account the priority of task execution and the divisibility of tasks when allocating divisible and indivisible subtasks to computing nodes. Finally, this divisible - scheduling problem is solved through two meta - heuristic algorithms, providing optimized solutions to improve the computational efficiency of hyperspectral - image big - data processing on the cloud, especially in exploring the parallelism in the parallel - processing stream. In short, the core problem of the paper is to improve the processing efficiency of hyperspectral - image classification in the cloud - computing environment through optimizing the scheduling strategy, especially for the processing of large - scale data sets.