Parallel incremental efficient attribute reduction algorithm based on attribute tree
Weiping Ding,Tingzhen Qin,Xinjie Shen,Hengrong Ju,Haipeng Wang,Jiashuang Huang,Ming Li
DOI: https://doi.org/10.1016/j.ins.2022.08.044
IF: 8.1
2022-09-01
Information Sciences
Abstract:Attribute reduction is an important application of rough sets. Efficiently reducing massive dynamic data sets quickly has always been a major goal of researchers. Traditional incremental methods focus on reduction by updated approximations. However, these methods must evaluate all attributes and repeatedly calculate their importance. When these algorithms are applied to large datasets with high time complexity, reducing large decision systems becomes inefficient. We propose an incremental acceleration strategy based on attribute trees to solve this problem. The key step is to cluster all attributes into multiple trees for incremental attribute evaluation. Specifically, we first select the appropriate attribute tree for attribute evaluation according to the attribute tree correlation measure to reduce the time complexity. Next, the branch coefficient is added to the stop criterion, increasing with the branch depth and guiding a jump out of the loop after reaching the maximum threshold. This avoids redundant calculation and improves efficiency. Furthermore, we propose an algorithm for incremental attribute reduction based on attribute trees using these improvements. Finally, a Spark parallel mechanism is added to parallelize data processing to implement the parallel incremental efficient attribute reduction based on the attribute tree. Experimental results on the Shuttle dataset show that the time consumption of our algorithm is more than 40% lower than that of the classical IARC algorithm while maintaining its good classification performance. In addition, the time is shortened by more than 87% from the benchmark after adding the Spark parallelizing mechanism.
computer science, information systems