Abstract:Artiﬁcial neural networks (ANNs) are considered the current best models of biological vision. ANNs are the best predictors of neural activity in the ventral stream; moreover, recent work has demonstrated that ANN models ﬁtted to neuronal activity can guide the synthesis of images that drive pre-speciﬁed response patterns in small neuronal populations. Despite the success in predicting and steering ﬁring activity, these results have not been connected with perceptual or behavioral changes. Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reﬂected in be-8 havior. We generated ‘deceptive images’ of human faces, monkey faces, and noise patterns so that they are perceived as a different, pre-speciﬁed target category, and measured both monkey neuronal responses and human behavior to these images. We found several effective methods for changing primate visual categorization that required much smaller image change compared to untargeted noise. Our work shares the same goal with adversarial attack, namely the ma-13 nipulation of images with minimal, targeted noise that leads ANN models to misclassify the images. Our results represent a valuable step in quantifying and characterizing the differences in perturbation robustness of biological and artiﬁcial vision. a, b , The plots illustrate the similarity, among three visual systems, in method-level (image-averaged) success of noise-level-10 deceptive images. We compared monkey neuron responses (‘Neural’), human behavior (‘MTurk’), and ANN model categorization (‘Model’). The similar-ity in the pattern of deception success was quantiﬁed by Pearson’s correlation across methods ( a, b ) or images ( c, d ). a, c , The success pattern in monkey neurons was compared to human and model. b, d , The success pattern in human was compared to monkey and model. The scatter and indicate correlation between visual systems (mean and bootstrap 95%-CI of the mean). The and indicate inter-subject split-half self-consistency (mean and The heat map the performance of ANN-based predictive models of monkey neuronal responses, in conditions of interpolation (testing on held-out images from trained categories) and extrapolation (testing on images from held-out categories). Model performance was quantiﬁed by the fraction of response predicted (see Methods). Categories included in training are indicated by small grey squares in the heat map. , results in e are summarized and combined over different deceptive image directions. Small dots indicate individual cell values as in e, color coded by the image category. Larger dots with whiskers indicate mean and bootstrap 95%-CI of the mean within each training conﬁguration, color coded by whether the result corresponds to interpolation (lighter color) or extrapolation (darker color) performance; and whether the tested category was clean (blue) or deceptive (orange)

L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement

Capturing human categorization of natural images at scale by combining deep networks and cognitive models

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Evaluating (and Improving) the Correspondence Between Deep Neural Networks and Human Representations

Learning From Brains How to Regularize Machines

Robustified ANNs Reveal Wormholes Between Human Category Percepts

Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

Basic Level Categorization Facilitates Visual Object Recognition

Aligning Machine and Human Visual Representations across Abstraction Levels

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Fooling the primate brain with minimal, targeted image manipulation

Extreme Image Transformations Affect Humans and Machines Differently

Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli

Research on image classification leveraging deep convolutional neural networks and visual cognition

Soft Augmentation for Image Classification

Enhancing Image Description Generation through Deep Reinforcement Learning: Fusing Multiple Visual Features and Reward Mechanisms

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples

Image edge enhancement for effective image classification

Increasing Interpretability of Neural Networks By Approximating Human Visual Saliency