Abstract:Ecological sciences are using imagery from a variety of sources to monitor and survey populations and ecosystems. Very High Resolution (VHR) satellite imagery provide an effective dataset for large scale surveys. Convolutional Neural Networks have successfully been employed to analyze such imagery and detect large animals. As the datasets increase in volume, O(TB), and number of images, O(1k), utilizing High Performance Computing (HPC) resources becomes necessary. In this paper, we investigate a task-parallel data-driven workflows design to support imagery analysis pipelines with heterogeneous tasks on HPC. We analyze the capabilities of each design when processing a dataset of 3,000 VHR satellite images for a total of 4~TB. We experimentally model the execution time of the tasks of the image processing pipeline. We perform experiments to characterize the resource utilization, total time to completion, and overheads of each design. Based on the model, overhead and utilization analysis, we show which design approach to is best suited in scientific pipelines with similar characteristics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to design and implement an efficient computing framework to support the execution of heterogeneous tasks on high - performance computing (HPC) resources when dealing with large - scale satellite image datasets. Specifically, the paper focuses on the design and optimization of high - resolution satellite image analysis processes used in ecological research. As the volume of datasets (reaching the terabyte level) and the number of images (reaching the thousand level) increase, it becomes necessary to utilize high - performance computing resources. The paper aims to determine the design method most suitable for scientific pipelines with similar characteristics by experimentally modeling the task execution times of different design methods and analyzing the resource utilization, total completion time, and overhead of each design method. ### Main contributions of the paper 1. **Improvement indications for the workflow engine**: Specific suggestions on how to further implement the workflow engine are provided in order to maximize resource utilization while minimizing the workflow completion time. 2. **Design guidelines**: Specific design guidelines for task - based computing frameworks to support data - driven, computationally - intensive workflows on high - performance computing resources are proposed. 3. **Experimental comparison method**: An experimental - based method for comparing the performance of different designs is provided, which does not depend on specific use cases or computing frameworks. ### Use cases The paper uses the Antarctic seal survey as an example and analyzes 3,097 satellite images with a total data volume of approximately 4TB. This use case requires repeatedly processing these images, running CPU and GPU code, and exchanging several gigabytes of data. ### Workflow design The paper explores two main workflow design methods: - **Design 1: One pipeline per image**: Each pipeline consists of two stages, and each stage contains one type of task. The tasks in the first stage receive an image as input and generate slices of the image; the tasks in the second stage receive the generated slices as input, calculate the number of seals in each slice, and output the results for the entire image. - **Design 2: One pipeline for multiple images**: A queuing mechanism is introduced, and tasks will continue to execute until resources are exhausted once they are started. Data and control signals between tasks are communicated through queues. ### Experimental results The paper evaluates different design methods through three experiments: 1. **Task execution time**: The relationship between task execution time and image size is analyzed, and it is found that the task execution time has a linear relationship with the image size. 2. **Resource utilization**: The total resource utilization of each design method is measured. 3. **Middleware overhead**: The middleware overhead for implementing each design method is characterized. Through these experiments, the paper obtains the performance comparison of different design methods, providing a basis for choosing the design method most suitable for a specific scientific pipeline.

Workflow Design Analysis for High Resolution Satellite Image Analysis

Comparing Workflow Application Designs for High Resolution Satellite Image Analysis

A Workflow for Automated Satellite Image Processing: from Raw VHSR Data to Object-Based Spectral Information for Smallholder Agriculture

A new architecture paradigm for image processing pipeline applied to massive remote sensing data production

Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

A Mechanism of Remote Sensing Image for Parallel Processing Base on Splitting Blocks

High Performance Hyperspectral Image Classification using Graphics Processing Units

AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources

Astronomical Image Processing at Scale With Pegasus and Montage

Analyzing the HCP Datasets using GPUs: The Anatomy of a Science Engagement

On-Demand Processing for Remote Sensing Big Data Analysis

Parallel Versus Distributed Data Access for Gigapixel-Resolution Histology Images: Challenges and Opportunities

A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization

Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems

An On-Demand Processing Framework for Faster Remote Sensing Big Data Analysis

Workflow for high-quality visualisation of large-scale CFD simulations by volume rendering

Deep learning workflow to support in-flight processing of digital aerial imagery for wildlife population surveys

Accelerating Remote Sensing Data Analysis Workflows by On-demand Processing

GEOMETRIC PROCESSING OF VERY HIGH-RESOLUTION SATELLITE IMAGERY: QUALITY ASSESSMENT FOR 3D MAPPING NEEDS

Hierarchical Task Analysis of a Synthetic Aperture Radar Analysis Process