Abstract:Current trends in scientific imaging are challenged by the emerging need of integrating sophisticated machine learning with Big Data analytics platforms. This work proposes an in-memory distributed learning architecture for enabling sophisticated learning and optimization techniques on scientific imaging problems, which are characterized by the combination of variant information from different origins. We apply the resulting, Spark-compliant, architecture on two emerging use cases from the scientific imaging domain, namely: (a) the space variant deconvolution of galaxy imaging surveys (astrophysics), (b) the super-resolution based on coupled dictionary training (remote sensing). We conduct evaluation studies considering relevant datasets, and the results report at least 60\% improvement in time response against the conventional computing solutions. Ultimately, the offered discussion provides useful practical insights on the impact of key Spark tuning parameters on the speedup achieved, and the memory/disk footprint.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use distributed learning architectures to process large - scale data sets and implement efficient learning and optimization techniques in the field of scientific imaging. Specifically, the paper proposes a memory - based distributed learning architecture, aiming to address the challenges of large data volumes and diverse data sources in scientific imaging problems. The paper demonstrates the effectiveness of this architecture through two specific application cases: 1. **Spatially - Variant Deconvolution in Astrophysics**: The paper explores how to remove the distortion caused by the telescope point - spread function (PSF) in galactic image surveys. The PSF describes the response of the imaging system to point sources, and removing the PSF (i.e., deconvolution) is a complex problem, especially in the presence of random noise. The paper proposes an optimization problem to recover the original galactic image by minimizing the residuals: \[ \arg\min_X \frac{1}{2} \|Y - H(X)\|_2^2 \] where \(Y\) is the observed noisy galactic image, \(X\) is the true galactic image, and \(H(X)\) represents the convolution of each galactic image with the PSF at its corresponding position. To stabilize the problem and obtain a unique solution, regularization terms such as sparse approximation or low - rank approximation need to be added. 2. **Super - Resolution Based on Coupled Dictionary Training**: The paper also studies how to perform joint dictionary training between low - resolution and high - resolution data to achieve super - resolution of images. This application case involves recovering high - resolution images from low - resolution images, which is of great significance for fields such as remote sensing. The paper experimentally evaluates the performance of the proposed distributed learning architecture on actual data sets, and the results show that this architecture improves the time response by at least 60% compared to traditional computational solutions. In addition, the paper also discusses the influence of key Spark tuning parameters on the speedup ratio and memory/disk occupancy, providing valuable guidance for practical applications.

A Distributed Learning Architecture for Scientific Imaging Problems

Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

Exascale Deep Learning for Scientific Inverse Problems

Learning the sampling density in 2D SPARKLING MRI acquisition for optimized image reconstruction

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning

Fast and Scalable Distributed Deep Convolutional Autoencoder for fMRI Big Data Analytics

Distributed Analytics For Big Data: A Survey

Distributed Deep Learning in Open Collaborations

Efficient Federated Learning for distributed NeuroImaging Data

Scalable Hybrid Learning Techniques for Scientific Data Compression

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Learned Interferometric Imaging for the SPIDER Instrument

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Distributed Neural Representation for Reactive in situ Visualization

Accelerating Domain-aware Deep Learning Models with Distributed Training

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Distributed Training Large-Scale Deep Architectures

A new architecture paradigm for image processing pipeline applied to massive remote sensing data production

Learned Gradient Compression for Distributed Deep Learning