Pheromone-Guided Parallel Rough Hypercuboid Attribute Reduction Algorithm
Weiping Ding,Hongcheng Yao,Hengrong Ju,Jiashuang Huang,Shu Jiang,Yuepeng Chen
DOI: https://doi.org/10.1016/j.asoc.2024.111479
IF: 8.7
2024-01-01
Applied Soft Computing
Abstract:In knowledge discovery and data mining, removing redundant and irrelevant data attributes is crucial. Traditional algorithms, however, struggle with efficiency in high-dimensional big data contexts. To solve this problem, this paper proposes a novel global search attribute reduction method namely RHC-IGWO (Rough Hypercuboid Structure via Improved Grey Wolf Optimizer) by integrating the rough hypercuboid method and Improved Grey Wolf Optimizer (IGWO) with the pheromone mechanism. The algorithm is embedded into the Apache Spark parallel computing framework (Parallel computing Rough Hypercuboid Structure via Improved Grey Wolf Optimizer, PcRHC-IGWO) to accelerate and simplify the attribute reduction process. The algorithm divides the decision table into several independent blocks, introduces the pheromone mechanism to simulate the wolf pack behavior, and uses the IGWO for global search, which is conducive to efficient local search and global information sharing between individuals. The position of the individual is initialized by calculating the relevance between the attributes, and the pheromone value is dynamically updated according to the reduction quality. This allows automatically giving more search focus to more promising attribute regions. Experiments with public and real datasets demonstrate the RHC-IGWO algorithm's significant speedup and its efficacy in maintaining or enhancing classification accuracy. Particularly noteworthy is its performance on schizophrenia datasets, where the proposed method achieves outstanding classification accuracies of 86.2%, 88.89%, and 92.86% across various classifiers. These results not only demonstrate its effectiveness but also underline its potential in advanced data analysis scenarios. Additionally, on some large-scale datasets, the time required for processing has been reduced by 85.71%, showcasing the algorithm's efficiency in handling complex and voluminous data.