On the Architecture of a Big Data Classification Tool Based on a Map Reduce Approach for Hyperspectral Image Analysis

V. A. Ayma,R. S. Ferreira,P. N. Happ,D. A. B. Oliveira,G. A. O. P. Costa,R. Q. Feitosa,A. Plaza,P. Gamba
DOI: https://doi.org/10.1109/igarss.2015.7326066
2015-01-01
Abstract:Advances in remote sensors are providing exceptional quantities of large-scale data with increasing spatial, spectral and temporal resolutions, raising new challenges in its analysis, e.g. those presents in classification processes. This work presents the architecture of the InterIMAGE Cloud Platform (ICP): Data Mining Package; a tool able to perform supervised classification procedures on huge amounts of data, on a distributed infrastructure. The architecture is implemented on top of the MapReduce framework. The tool has four classification algorithms implemented taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines. The SVM classifier was applied on datasets of different sizes (2 GB, 4 GB and 10 GB) for different cluster configurations (5, 10, 20, 50 nodes). The results show the tool as a potential approach to parallelize classification processes on big data.
What problem does this paper attempt to address?