Abstract:Attribute reduction or attribute subset selection is among the highly important, and essential data pre-processing tasks in all the applications belonging to various domains of engineering that fall under the broad spectrum of artificial intelligence. The process of attribute subset selection and the significance of each selected attribute greatly affect the classification performance of any machine learning algorithm. Rough set theory-based solutions for attribute subset selection have been proven to be very effective for categorical information systems. However, most of those attribute reduction algorithms are serial in nature. They are either inefficient in processing datasets having a very large number of dimensions or their efficiency is overshadowed by high computational costs. Hence, they are becoming inapplicable to the current data processing requirements. To address this problem, we first propose a novel and efficient attribute reduction algorithm named Reduction of Attributes based on Association and Separation (RAAS). This algorithm is based on two measures: the degree of association (DA) of objects within a class and the degree of separation (DS) among objects of different classes. These measures are used for the evaluation of the significance of each attribute as well as the classification ability of each attribute subset. A sequential backward elimination strategy using the DA and the DS is designed to obtain the optimal attribute subset. The RAAS algorithm is evaluated against other typical reduction algorithms over a few publicly available standard datasets from the UCI data repository. The experimental results show that RAAS produces better classification accuracies in comparison to the others. We then designed the parallel version of RAAS, the other proposed algorithm called Parallel Attribute Reduction Algorithm based on Association and Separation (PARAAS) which is both efficient and fast. The PARAAS algorithm is the first algorithm that is designed specifically to perform attribute reduction of larger dimensional categorical datasets on graphics processing units (GPUs) that support CUDA. Experimental analysis suggests that PARAAS has the ability to produce high classification accuracies in significantly low execution times.

A Fast Parallel Attribute Reduction Algorithm Using Apache Spark.

Parallel incremental efficient attribute reduction algorithm based on attribute tree

Distributed High-Dimension Matrix Operation Optimization on Spark

Parallel Large-Scale Attribute Reduction on Cloud Systems

A Distributed Attribute Reduction Based on Neighborhood Evidential Conflict with Apache Spark

A Parallel Attribute Reduction Algorithm Based on Affinity Propagation Clustering.

An Efficient Attribute Reduction Algorithm Using MapReduce

Efficient and Fast Algorithm for Attribute Reduction of Large Dimensional Data Using Rough Set Theory on Graphics Processing Unit

BiFuG2-Spark: Bi-directional Fuzzy Granular-Cabin Parallel Attribute Reduction Accelerator with Granular-Group Collaboration

RA-MRS: A High Efficient Attribute Reduction Algorithm in Big Data

A Parallel Attribute Reduction Method Based on Classification

Parallelization of Classification Algorithms Based on SparkR

A Parallel Minimum Attribute Co-reduction Accelerator based on Quantum-inspired SFLA and MapReduce Framework

Pheromone-Guided Parallel Rough Hypercuboid Attribute Reduction Algorithm

A Fast Parallel Random Forest Algorithm Based on Spark

PARA: A Positive-Region Based Attribute Reduction Accelerator

Design and Implementation of Parallel DBSCAN Algorithm Based on Spark

In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model

Comparative Study on MapReduce and Spark for Big Data Analytics

A Caching-Based Parallel FP-Growth in Apache Spark.

A Novel Rough Sets Positive Region Based Parallel Multi-reduction Algorithm.