Abstract:This work provides a computable, direct, and mathematically rigorous approximation to the differential geometry of class manifolds for high-dimensional data, along with nonlinear projections from input space onto these class manifolds. The tools are applied to the setting of neural network image classifiers, where we generate novel, on-manifold data samples, and implement a projected gradient descent algorithm for on-manifold adversarial training. The susceptibility of neural networks (NNs) to adversarial attack highlights the brittle nature of NN decision boundaries in input space. Introducing adversarial examples during training has been shown to reduce the susceptibility of NNs to adversarial attack; however, it has also been shown to reduce the accuracy of the classifier if the examples are not valid examples for that class. Realistic "on-manifold" examples have been previously generated from class manifolds in the latent of an autoencoder. Our work explores these phenomena in a geometric and computational setting that is much closer to the raw, high-dimensional input space than can be provided by VAE or other black box dimensionality reductions. We employ conformally invariant diffusion maps (CIDM) to approximate class manifolds in diffusion coordinates, and develop the Nyström projection to project novel points onto class manifolds in this setting. On top of the manifold approximation, we leverage the spectral exterior calculus (SEC) to determine geometric quantities such as tangent vectors of the manifold. We use these tools to obtain adversarial examples that reside on a class manifold, yet fool a classifier. These misclassifications then become explainable in terms of human-understandable manipulations within the data, by expressing the on-manifold adversary in the semantic basis on the manifold.

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

Deep Manifold Computing and Visualization

Deep Manifold Computing and Visualization Using Elastic Locally Isometric Smoothness

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision

The Manifold Hypothesis for Gradient-Based Explanations

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Isometric Immersion Learning with Riemannian Geometry

Markov-Lipschitz Deep Learning

On-Manifold Projected Gradient Descent

Visual Feature Attribution using Wasserstein GANs

A Geometrical Characterization on Feature Density of Image Datasets

A-FMI: Learning Attributions from Deep Networks via Feature Map Importance

Generalized Integrated Gradients: A practical method for explaining diverse ensembles

Strengthening Interpretability: An Investigative Study of Integrated Gradient Methods

Greedy PIG: Adaptive Integrated Gradients

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Deep Manifold Embedding of Attributed Graphs.