Efficient and Fast Algorithm for Attribute Reduction of Large Dimensional Data Using Rough Set Theory on Graphics Processing Unit

V. K. Hanuman Turaga,Srilatha Chebrolu
DOI: https://doi.org/10.1007/s13369-024-09147-7
IF: 2.807
2024-06-16
Arabian Journal for Science and Engineering
Abstract:Attribute reduction or attribute subset selection is among the highly important, and essential data pre-processing tasks in all the applications belonging to various domains of engineering that fall under the broad spectrum of artificial intelligence. The process of attribute subset selection and the significance of each selected attribute greatly affect the classification performance of any machine learning algorithm. Rough set theory-based solutions for attribute subset selection have been proven to be very effective for categorical information systems. However, most of those attribute reduction algorithms are serial in nature. They are either inefficient in processing datasets having a very large number of dimensions or their efficiency is overshadowed by high computational costs. Hence, they are becoming inapplicable to the current data processing requirements. To address this problem, we first propose a novel and efficient attribute reduction algorithm named Reduction of Attributes based on Association and Separation (RAAS). This algorithm is based on two measures: the degree of association (DA) of objects within a class and the degree of separation (DS) among objects of different classes. These measures are used for the evaluation of the significance of each attribute as well as the classification ability of each attribute subset. A sequential backward elimination strategy using the DA and the DS is designed to obtain the optimal attribute subset. The RAAS algorithm is evaluated against other typical reduction algorithms over a few publicly available standard datasets from the UCI data repository. The experimental results show that RAAS produces better classification accuracies in comparison to the others. We then designed the parallel version of RAAS, the other proposed algorithm called Parallel Attribute Reduction Algorithm based on Association and Separation (PARAAS) which is both efficient and fast. The PARAAS algorithm is the first algorithm that is designed specifically to perform attribute reduction of larger dimensional categorical datasets on graphics processing units (GPUs) that support CUDA. Experimental analysis suggests that PARAAS has the ability to produce high classification accuracies in significantly low execution times.
multidisciplinary sciences
What problem does this paper attempt to address?