Parallel Pls Aigorithm Using Mapreduce and Its Aplication in Spectral Modeling

Yang Hui-hua,Du Ling-ling,Li Ling-qiao,Tang Tian-biao,Guo Tuo,Liang Qiong-lin,Wang Yi-ming,Luo Guo-an
DOI: https://doi.org/10.3964/j.issn.1000-0593(2012)09-2399-06
2012-01-01
Abstract:Partial least squares (PLS) has been widely used in spectral analysis and modeling, and it is computation-intensive and time-demanding when dealing with massive data To solve this problem effectively, a novel parallel PLS using MapReduce is proposed, which consists of two procedures, the parallelization of data standardizing and the parallelization of principal component computing. Using NIR spectral modeling as an example, experiments were conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that the parallel PLS algorithm proposed can handle massive spectra, can significantly cut down the modeling time, and gains a basically linear speedup, and can be easily scaled up.
What problem does this paper attempt to address?