Fooling the primate brain with minimal, targeted image manipulation
Li Yuan,Will Xiao,Giorgia Dellaferrera,Gabriel Kreiman,Francis E.H. Tay,Jiashi Feng,Margaret S. Livingstone
DOI: https://doi.org/10.48550/arxiv.2011.05623
2020-01-01
Abstract:Artificial neural networks (ANNs) are considered the current best models of biological vision. ANNs are the best predictors of neural activity in the ventral stream; moreover, recent work has demonstrated that ANN models fitted to neuronal activity can guide the synthesis of images that drive pre-specified response patterns in small neuronal populations. Despite the success in predicting and steering firing activity, these results have not been connected with perceptual or behavioral changes. Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in be-8 havior. We generated ‘deceptive images’ of human faces, monkey faces, and noise patterns so that they are perceived as a different, pre-specified target category, and measured both monkey neuronal responses and human behavior to these images. We found several effective methods for changing primate visual categorization that required much smaller image change compared to untargeted noise. Our work shares the same goal with adversarial attack, namely the ma-13 nipulation of images with minimal, targeted noise that leads ANN models to misclassify the images. Our results represent a valuable step in quantifying and characterizing the differences in perturbation robustness of biological and artificial vision. a, b , The plots illustrate the similarity, among three visual systems, in method-level (image-averaged) success of noise-level-10 deceptive images. We compared monkey neuron responses (‘Neural’), human behavior (‘MTurk’), and ANN model categorization (‘Model’). The similar-ity in the pattern of deception success was quantified by Pearson’s correlation across methods ( a, b ) or images ( c, d ). a, c , The success pattern in monkey neurons was compared to human and model. b, d , The success pattern in human was compared to monkey and model. The scatter and indicate correlation between visual systems (mean and bootstrap 95%-CI of the mean). The and indicate inter-subject split-half self-consistency (mean and The heat map the performance of ANN-based predictive models of monkey neuronal responses, in conditions of interpolation (testing on held-out images from trained categories) and extrapolation (testing on images from held-out categories). Model performance was quantified by the fraction of response predicted (see Methods). Categories included in training are indicated by small grey squares in the heat map. , results in e are summarized and combined over different deceptive image directions. Small dots indicate individual cell values as in e, color coded by the image category. Larger dots with whiskers indicate mean and bootstrap 95%-CI of the mean within each training configuration, color coded by whether the result corresponds to interpolation (lighter color) or extrapolation (darker color) performance; and whether the tested category was clean (blue) or deceptive (orange)