Generic Multiplicative Methods for Implementing Machine Learning Algorithms on MapReduce

Song Liu,Peter Flach,Nello Cristianini
DOI: https://doi.org/10.48550/arXiv.1111.2111
2011-12-02
Abstract:In this paper we introduce a generic model for multiplicative algorithms which is suitable for the MapReduce parallel programming paradigm. We implement three typical machine learning algorithms to demonstrate how similarity comparison, gradient descent, power method and other classic learning techniques fit this model well. Two versions of large-scale matrix multiplication are discussed in this paper, and different methods are developed for both cases with regard to their unique computational characteristics and problem settings. In contrast to earlier research, we focus on fundamental linear algebra techniques that establish a generic approach for a range of algorithms, rather than specific ways of scaling up algorithms one at a time. Experiments show promising results when evaluated on both speedup and accuracy. Compared with a standard implementation with computational complexity $O(m^3)$ in the worst case, the large-scale matrix multiplication experiments prove our design is considerably more efficient and maintains a good speedup as the number of cores increases. Algorithm-specific experiments also produce encouraging results on runtime performance.
Data Structures and Algorithms,Machine Learning
What problem does this paper attempt to address?