Vectorized K-Means Algorithm on Heterogeneous CPU/MIC Architecture

Yusong TAN,Fuhui WU,Qingbo WU,Wei CHEN,Xiaoli SUN
DOI: https://doi.org/10.3778/j.issn.1673-9418.1312029
2014-01-01
Abstract:In the context of big data era, K-Means is an important algorithm of cluster analysis of data mining. The massive high-dimensional data processing brings strong performance demand on K-Means algorithms. The newly proposed MIC (many integrated core) architecture provides both thread-level parallel between cores and instruction-level parallel in each core, which make MIC good choice for algorithm acceleration. Firstly, this paper describes the basic K-Means algorithm and analyzes its bottleneck. Then it proposes a novel vectorized K-Means algorithm which optimizes vector data layout strategy and gets higher parallel performance. Moreover, it implements the vectorized algorithm on CPU/MIC heterogeneous platform, and explores the MIC optimization strategy in non-traditional HPC (high performance computing) applications. The experimental results prove that the vectorized K-Means algorithm has excellent performance and scalability.
What problem does this paper attempt to address?