Making Frequent-Pattern Mining Scalable, Efficient, and Compact on Nonvolatile Memories
Chaoshu Yang,Po-Chun Huang,Yi Lin,Jiaqi Dong,Duo Liu,Yujuan Tan,Liang
DOI: https://doi.org/10.1109/tcad.2020.3015455
IF: 2.9
2021-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Frequent-pattern mining is a common means to reveal the hidden trends behind data. However, most frequent-pattern mining algorithms are designed for dynamic random-access memory (DRAM), instead of nonvolatile memories (NVMs) which are preferred by energy-limited systems. Due to the huge differences between the characteristics of NVMs and those of DRAM, existing frequent-pattern mining algorithms encounter the issues of write amplification and energy waste when they are run on NVMs. Moreover, the design complexity is exaggerated when parallel computing architecture is introduced to speedup the mining process. A scalable, time-efficient, and energy-economic solution to the frequent-pattern mining problem is thus urgently needed. Based on the well-known frequent-pattern tree (FP-tree) approach to frequent-pattern mining, this article proposes parallel EvFP-tree (PevFP-tree), a parallel frequent-pattern mining solution for NVMs. By considering the NVM characteristics, PevFP-tree accelerates the mining process and enhances the energy efficiency, as compared to a straightforward design of FP-trees on the parallel architecture. Moreover, PevFP-tree offers superior scalability in terms of the degrees of parallelism of the mining algorithm and the branching factor of its tree structure. Observing that keys are often sparsely distributed in FP-trees, we also propose a compression technique to PevFP-tree, namely, compressed PevFP-tree (CpevFP-tree), which further enhances the time and energy efficiencies of PevFP-tree. The proposed PevFP-tree and CpevFP-tree are evaluated by a series of experiments based on realistic datasets from diversified application scenarios, where CpevFP-tree achieves 88.73% of performance improvements over a straightforward design of FP-trees in the parallel architecture, and 79.47% of performance improvements over PevFP-tree, on average.