Abstract:The advent of high-throughput sequencing technologies has revolutionized the field of multi-omics patient data analysis. While these techniques offer a wealth of information, they often generate datasets with dimensions far surpassing the number of available cases. This discrepancy in size gives rise to the challenging “small-sample-size” problem, significantly compromising the reliability of any subsequent estimate, whether supervised or unsupervised. This calls for effective dimensionality reduction techniques to transform high-dimensional datasets into lower-dimensional spaces, making the data manageable and facilitating subsequent analyses. Unfortunately, the definition of a proper di-mensionality reduction pipeline is not an easy task; besides the problem of identifying the best dimensionality reduction method, the definition of the dimension of the lower-dimensional space into which each dataset should be transformed is a crucial issue that influences all the subsequent analyses and should therefore be carefully considered. Further, the availability of multi-modal data calls for proper data-fusion techniques to produce an integrated patient-view into which redundant information is removed while salient and complementary information across views is leveraged to improve the performance and reliability of both unsupervised and supervised learning techniques. This paper proposes leveraging the intrinsic dimensionality of each view in a multi-modal dataset to define the dimensionality of the lower-dimensional space where the view is transformed by dimensionality reduction algorithms. Further, it presents a thorough experimental study that compares the traditional application of a unique-step of dimensionality reduction with a two-step approach, involving a prior feature selection followed by feature extraction. Through this comparative evaluation, we scrutinize the performance of widely used dimensionality reduction algorithms. Importantly, we also investigate their impact on unsupervised data-fusion techniques, which are pivotal in biomedical research. Our findings shed light on the most effective strategies for handling high-dimensional multi-omics patient data, offering valuable insights for future studies in this domain.

Supervised dimensionality reduction for big data

Computational and Theoretical Analysis of Supervised Dimensionality Reduction

Linear Dimensionality Reduction: Survey, Insights, and Generalizations

A Perception-Driven Approach to Supervised Dimensionality Reduction for Visualization

Divergence Maximizing Linear Projection for Supervised Dimension Reduction

Deep Dimension Reduction for Supervised Representation Learning

Multilevel Functional Principal Component Analysis for High-Dimensional Data

Ten quick tips for effective dimensionality reduction

Modern Dimension Reduction

Intrinsic-Dimension analysis for guiding dimensionality reduction and data-fusion in multi-omics data processing

An Efficient Sufficient Dimension Reduction Method for Identifying Genetic Variants of Clinical Significance

Supervised Linear Dimension-Reduction Methods: Review, Extensions, and Comparisons

Joint Dimensionality Reduction for Separable Embedding Estimation

Exploring Dimension Learning Via a Penalized Probabilistic Principal Component Analysis

Out-of-Core Dimensionality Reduction for Large Data via Out-of-Sample Extensions

Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction

Using Dimension Reduction to Improve the Classification of High-dimensional Data

Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction

Large-margin Weakly Supervised Dimensionality Reduction.

Dimension reduction for covariates in network data

Best Subset Solution Path for Linear Dimension Reduction Models using Continuous Optimization