A Distributed Learning Architecture for Scientific Imaging Problems

A. Panousopoulou,S. Farrens,K. Fotiadou,A. Woiselle,G. Tsagkatakis,J-L. Starck,P. Tsakalides
DOI: https://doi.org/10.48550/arXiv.1809.05956
2018-09-28
Abstract:Current trends in scientific imaging are challenged by the emerging need of integrating sophisticated machine learning with Big Data analytics platforms. This work proposes an in-memory distributed learning architecture for enabling sophisticated learning and optimization techniques on scientific imaging problems, which are characterized by the combination of variant information from different origins. We apply the resulting, Spark-compliant, architecture on two emerging use cases from the scientific imaging domain, namely: (a) the space variant deconvolution of galaxy imaging surveys (astrophysics), (b) the super-resolution based on coupled dictionary training (remote sensing). We conduct evaluation studies considering relevant datasets, and the results report at least 60\% improvement in time response against the conventional computing solutions. Ultimately, the offered discussion provides useful practical insights on the impact of key Spark tuning parameters on the speedup achieved, and the memory/disk footprint.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use distributed learning architectures to process large - scale data sets and implement efficient learning and optimization techniques in the field of scientific imaging. Specifically, the paper proposes a memory - based distributed learning architecture, aiming to address the challenges of large data volumes and diverse data sources in scientific imaging problems. The paper demonstrates the effectiveness of this architecture through two specific application cases: 1. **Spatially - Variant Deconvolution in Astrophysics**: The paper explores how to remove the distortion caused by the telescope point - spread function (PSF) in galactic image surveys. The PSF describes the response of the imaging system to point sources, and removing the PSF (i.e., deconvolution) is a complex problem, especially in the presence of random noise. The paper proposes an optimization problem to recover the original galactic image by minimizing the residuals: \[ \arg\min_X \frac{1}{2} \|Y - H(X)\|_2^2 \] where \(Y\) is the observed noisy galactic image, \(X\) is the true galactic image, and \(H(X)\) represents the convolution of each galactic image with the PSF at its corresponding position. To stabilize the problem and obtain a unique solution, regularization terms such as sparse approximation or low - rank approximation need to be added. 2. **Super - Resolution Based on Coupled Dictionary Training**: The paper also studies how to perform joint dictionary training between low - resolution and high - resolution data to achieve super - resolution of images. This application case involves recovering high - resolution images from low - resolution images, which is of great significance for fields such as remote sensing. The paper experimentally evaluates the performance of the proposed distributed learning architecture on actual data sets, and the results show that this architecture improves the time response by at least 60% compared to traditional computational solutions. In addition, the paper also discusses the influence of key Spark tuning parameters on the speedup ratio and memory/disk occupancy, providing valuable guidance for practical applications.