Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli

Valentyn Piskovskyi,Riccardo Chimisso,Sabrina Patania,Tom Foulsham,Giuseppe Vizzari,Dimitri Ognibene
2024-10-06
Abstract:The purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications.
Computer Vision and Pattern Recognition,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore a method for studying the impact of multiple image - enhancement techniques on visual - cortex activation in specific tasks without conducting a brain - imaging procedure separately. Specifically, the authors used a set of pre - trained neural - network models based on publicly available data sets that provide fMRI - recorded responses of the brain to non - enhanced natural - scene images. Through this method, they hope to predict the response of the visual cortex to new stimuli enhanced by different techniques, thereby evaluating the generalization ability of the model in predicting the human brain's response to novel and enhanced visual stimuli. The advantage of this method is that it can avoid expensive scanning - procedure tests for each enhancement technique, but the main focus is on the reliability, rationality, and performance of this alternative method.