Scheduling-Guided Automatic Processing of Massive Hyperspectral Image Classification on Cloud Computing Architectures

Zebin Wu,Jin Sun,Yi Zhang,Yaoqin Zhu,Jun Li,Antonio Plaza,Jón Atli Benediktsson,Zhihui Wei,Jon Atli Benediktsson

DOI: https://doi.org/10.1109/tcyb.2020.3026673

IF: 11.8

2021-07-01

IEEE Transactions on Cybernetics

Abstract:The large data volume and high algorithm complexity of hyperspectral image (HSI) problems have posed big challenges for efficient classification of massive HSI data repositories. Recently, cloud computing architectures have become more relevant to address the big computational challenges introduced in the HSI field. This article proposes an acceleration method for HSI classification that relies on scheduling metaheuristics to automatically and optimally distribute the workload of HSI applications across multiple computing resources on a cloud platform. By analyzing the procedure of a representative classification method, we first develop its distributed and parallel implementation based on the MapReduce mechanism on Apache Spark. The subtasks of the processing flow that can be processed in a distributed way are identified as divisible tasks. The optimal execution of this application on Spark is further formulated as a divisible scheduling framework that takes into account both task execution precedences and task divisibility when allocating the divisible and indivisible subtasks onto computing nodes. The formulated scheduling framework is an optimization procedure that searches for optimized task assignments and partition counts for divisible tasks. Two metaheuristic algorithms are developed to solve this divisible scheduling problem. The scheduling results provide an optimized solution to the automatic processing of HSI big data on clouds, improving the computational efficiency of HSI classification by exploring the parallelism during the parallel processing flow. Experimental results demonstrate that our scheduling-guided approach achieves remarkable speedups by facilitating the automatic processing of HSI classification on Spark, and is scalable to the increasing HSI data volume.

automation & control systems,computer science, cybernetics, artificial intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the big - data - processing challenges in hyperspectral image (HSI) classification. Specifically, due to the large amount of hyperspectral image data and high algorithm complexity, this brings huge challenges to efficient classification. To solve this problem, the author proposes an acceleration method based on the scheduling meta - heuristic algorithm for automatically and optimally allocating the workload of hyperspectral image applications on the cloud - computing architecture. By analyzing the processes of representative classification methods, first, its distributed and parallel implementation is developed on Apache Spark based on the MapReduce mechanism. Further, the optimal execution of this application on Spark is formalized as a divisible - scheduling framework, which takes into account the priority of task execution and the divisibility of tasks when allocating divisible and indivisible subtasks to computing nodes. Finally, this divisible - scheduling problem is solved through two meta - heuristic algorithms, providing optimized solutions to improve the computational efficiency of hyperspectral - image big - data processing on the cloud, especially in exploring the parallelism in the parallel - processing stream. In short, the core problem of the paper is to improve the processing efficiency of hyperspectral - image classification in the cloud - computing environment through optimizing the scheduling strategy, especially for the processing of large - scale data sets.

Scheduling-Guided Automatic Processing of Massive Hyperspectral Image Classification on Cloud Computing Architectures

Multiobjective Task Scheduling for Energy-Efficient Cloud Implementation of Hyperspectral Image Classification

Recent Developments in Parallel and Distributed Computing for Remotely Sensed Big Data Processing

A Mechanism of Remote Sensing Image for Parallel Processing Base on Splitting Blocks

An Efficient Organization Method for Large-Scale and Long Time-Series Remote Sensing Data in a Cloud Computing Environment

Research on remote sensing image storage management and a fast visualization system based on cloud computing technology

Adaptive priority-based data placement and multi-task scheduling in geo-distributed cloud systems

CLUSTER-BASED OCEAN REMOTE SENSING IMAGE FUSION PARALLEL COMPUTING STRATEGY

Improved Hungarian algorithm–based task scheduling optimization strategy for remote sensing big data processing

Efficient Management and Scheduling of Massive Remote Sensing Image Datasets

An Efficient Task Scheduling Based on Hybrid Bird Swarm Flow Directional Model in Cloud Computing Environment

Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System

Optimizing Multi-Cloud CDN Deployment and Scheduling Strategies Using Big Data Analysis

Efficient Resource Scheduling for Big Data Processing in Cloud Platform

A NEW CLOUD-EDGE-TERMINAL RESOURCES COLLABORATIVE SCHEDULING FRAMEWORK FOR MULTI-LEVEL VISUALIZATION TASKS OF LARGE-SCALE SPATIO-TEMPORAL DATA

Performance optimization of computing task scheduling based on the Hadoop big data platform

Big Data Processing Workflows Oriented Real-Time Scheduling Algorithm using Task-Duplication in Geo-Distributed Clouds

Cloud Computing Cloud Computing in Remote Sensing : High Performance Remote Sensing Data Processing in a Big data Environment

Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous Clouds

A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems

A new architecture paradigm for image processing pipeline applied to massive remote sensing data production