Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis
Yang, Lanzhen,Zhang, Chengling,Wu, Jiaming
DOI: https://doi.org/10.1007/s13042-024-02226-z
2024-06-12
International Journal of Machine Learning and Cybernetics
Abstract:Boolean Matrix Factorization (BMF) helps unveil hidden patterns in boolean datasets and is a powerful tool in machine learning. However, when dealing with large datasets, reducing data size becomes crucial for BMF algorithms. In this paper, we revisit and propose novel data reduction approaches for BMF algorithms based on Formal Concept Analysis (FCA), aiming to minimize the impact of data reduction on factor quality. Specifically, we introduce the concept of intent vectors , and present incremental algorithms along with their associated theorems for capturing and quantifying these vectors, thereby facilitating a reduction in data size. More importantly, we propose two innovative approaches based on FCA principles that effectively identify and eliminate redundant rows in datasets through distinct deletion strategies. The first approach incrementally deletes rows while preserving the intent vectors of attribute concepts, thus maintaining the quality of factors. The second approach progressively removes rows from the reduced dataset by the first approach, by gradually adjusting the amount of concept loss to minimize any degradation in factor quality. Experiments demonstrate that our first reduction algorithm significantly decreases data size without degrading factor quality, consistently outperforming current leading algorithms with a success rate. Our second algorithm outperformed the existing algorithm in 72 out of 96 comparisons, greatly reducing data size with minimal loss in factor quality.
computer science, artificial intelligence