Abstract:High-dimensional datasets present substantial challenges in statistical modeling across various disciplines, necessitating effective dimensionality reduction methods. Deep learning approaches, notable for their capacity to distill essential features from complex data, facilitate modeling, visualization, and compression through reduced dimensionality latent feature spaces, have wide applications from bioinformatics to earth sciences. This study introduces a novel workflow to evaluate the stability of these latent spaces, ensuring consistency and reliability in subsequent analyses. Stability, defined as the invariance of latent spaces to minor data, training realizations, and parameter perturbations, is crucial yet often overlooked. Our proposed methodology delineates three stability types, sample, structural, and inferential, within latent spaces, and introduces a suite of metrics for comprehensive evaluation. We implement this workflow across 500 autoencoder realizations and three datasets, encompassing both synthetic and real-world scenarios to explain latent space dynamics. Employing k-means clustering and the modified Jonker-Volgenant algorithm for class alignment, alongside anisotropy metrics and convex hull analysis, we introduce adjusted stress and Jaccard dissimilarity as novel stability indicators. Our findings highlight inherent instabilities in latent feature spaces and demonstrate the workflow's efficacy in quantifying and interpreting these instabilities. This work advances the understanding of latent feature spaces, promoting improved model interpretability and quality control for more informed decision-making for diverse analytical workflows that leverage deep learning.

Bootstrap Confidence Regions for Learned Feature Embeddings

Measuring the Stability of Learned Features

The Big Data Bootstrap

Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap

Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks

Boosting Bootstrap FLD Subspaces for Multiclass Problem

Uncertainty-Aware Bootstrap Learning for Joint Extraction on Distantly-Supervised Data

Confidence Intervals and Simultaneous Confidence Bands Based on Deep Learning

Learning Random Fourier Features by Hybrid Constrained Optimization

Asymptotic and bootstrap tests for subspace dimension

Bootstrapping promotes the RSFC‐behavior associations: An application of individual cognitive traits prediction

Understanding Learned Models by Identifying Important Features at the Right Resolution

Embracing Uncertainty Flexibility: Harnessing a Supervised Tree Kernel to Empower Ensemble Modelling for 2D Echocardiography-Based Prediction of Right Ventricular Volume

Efficient Bayesian High-Dimensional Classification via Random Projection with Application to Gene Expression Data

Deep Ensembles: A Loss Landscape Perspective

Bootstrap Your Own Variance

A Bootstrap Hypothesis Test for High-Dimensional Mean Vectors

Long term follow-up of "full metal jacket" of de novo coronary lesions with new generation Zotarolimus-eluting stents.

Divergence Regulated Encoder Network for Joint Dimensionality Reduction and Classification

Bootstrapping for multivariate linear regression models

Evaluating the Stability of Deep Learning Latent Feature Spaces