Abstract:Characteristic gene selection and tumor classification of gene expression data play major roles in genomic research. Due to the characteristics of a small sample size and high dimensionality of gene expression data, it is a common practice to perform dimensionality reduction prior to the use of machine learning-based methods to analyze the expression data. In this context, classical principal component analysis (PCA) and its improved versions have been widely used. Recently, methods based on supervised discriminative sparse PCA have been developed to improve the performance of data dimensionality reduction. However, such methods still have limitations: most of them have not taken into consideration the improvement of robustness to outliers and noise, label information, sparsity, as well as capturing intrinsic geometrical structures in one objective function. To address this drawback, in this study, we propose a novel PCA-based method, known as the robust Laplacian supervised discriminative sparse PCA, termed RLSDSPCA, which enforces the L2,1 norm on the error function and incorporates the graph Laplacian into supervised discriminative sparse PCA. To evaluate the efficacy of the proposed RLSDSPCA, we applied it to the problems of characteristic gene selection and tumor classification problems using gene expression data. The results demonstrate that the proposed RLSDSPCA method, when used in combination with other related methods, can effectively identify new pathogenic genes associated with diseases. In addition, RLSDSPCA has also achieved the best performance compared with the state-of-the-art methods on tumor classification in terms of major performance metrics. The codes and data sets used in the study are freely available at http://csbio.njust.edu.cn/bioinf/rlsdspca/.The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c01403.Definition of the evaluation criteria; solutions of LSDSPCA and RLSDSPCA; convergence analysis of RLSDSPCA; notation details; and properties of RLSDSPCA (PDF)This article has not yet been cited by other publications.

A Comparison of Classification Accuracy Achieved with Wrappers, Filters and PCA

On the Relationship Between Feature Selection and Classification Accuracy

Supervised Linear Dimension-Reduction Methods: Review, Extensions, and Comparisons

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Improved Algorithms for High-Dimensional Robust Pca

A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification.

Feature Selection for Classification using Principal Component Analysis and Information Gain

A Comparative Study on using Principle Component Analysis with Different Text Classifiers

Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction

Comparative Analysis of 2D-PCA Based Dimensionality Reduction and Feature Extraction

A Selective Overview of Sparse Principal Component Analysis

Using Dimension Reduction to Improve the Classification of High-dimensional Data

A Review of Principal Component Analysis Algorithm for Dimensionality Reduction

nPCA: a linear dimensionality reduction method using a multilayer perceptron

Functional Classwise Principal Component Analysis: A Novel Classification Framework

Comparison of Stellar Classification Accuracies Using Automated Algorithms

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA

Empirical Evaluation of Kernel PCA Approximation Methods in Classification Tasks

Principal component analysis: a review and recent developments

An Exploration of the Application of Principal Component Analysis in Big Data Processing