Abstract:Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction produces linear combinations of the original attribute set. In this paper we investigate the relationship between attribute reduction techniques and the resulting classification accuracy for two very different application ares: On the one hand, we consider e-mail filtering, where various properties of e-mail messages are extracted, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. In the present work, subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the classification accuracy. First results show that the classification accuracy based on PCA are highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

The effectiveness of big data classification control based on principal component analysis

An Exploration of the Application of Principal Component Analysis in Big Data Processing

A Method Of Fault Diagnosis Based On Pca And Bayes Classification

Dynamic PCA-based Fault Detection and Diagnosis Analysis

A Pca Based Automatic Image Categorization Approach Using Dominant Color Features

A Review of Principal Component Analysis Algorithm for Dimensionality Reduction

Principal component analysis: a review and recent developments

A Comparative Study on using Principle Component Analysis with Different Text Classifiers

Deep Residual Principal Component Analysis As Feature Engineering for Industrial Data Analytics

On-line diagnosis of abnormal conditions of air separation process by dynamic PCA

Multi-space PCA with its application in fault diagnosis

Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes with Big Data.

Improved Algorithms for High-Dimensional Robust Pca

Robust Principal Component Analysis via Discriminant Sample Weight Learning

A Comparison of Classification Accuracy Achieved with Wrappers, Filters and PCA

Image classification base on PCA of multi-view deep representation

Feature extraction based on principal component analysis for text categorization

Functional Classwise Principal Component Analysis: A Novel Classification Framework

Principal component analysis and clustering on manifolds

A Selective Overview of Sparse Principal Component Analysis