Using Multi-Core HW/SW Co-design Architecture for Accelerating K-means Clustering Algorithm

Hadi Mardani Kamali
DOI: https://doi.org/10.48550/arXiv.1807.09250
2018-07-10
Abstract:The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is expected to grow superlinearly, making it a big challenge when dealing with BigData. K-mean clustering is an essential tool for many big data applications including data mining, predictive analysis, forecasting studies, and machine learning. However, due to large size (volume) of Big-Data, and large dimensionality of its data points, even the application of a simple k-mean clustering may become extremely time and resource demanding. Specially when it is necessary to have a fast and modular dataset analysis flow. In this paper, we demonstrate that using a two-level filtering algorithm based on binary kd-tree structure is able to decrease the time of convergence in K-means algorithm for large datasets. The two-level filtering algorithm based on binary kd-tree structure evolves the SW to naturally divide the classification into smaller data sets, based on the number of available cores and size of logic available in a target FPGA. The empirical result on this two-level structure over multi-core FPGA-based architecture provides 330X speed-up compared to a conventional software-only solution.
Distributed, Parallel, and Cluster Computing,Hardware Architecture
What problem does this paper attempt to address?