Abstract:Human visual system can selectively attend to parts of a scene for quick perception, a biological mechanism known as Human attention. Inspired by this, recent deep learning models encode attention mechanisms to focus on the most task-relevant parts of the input signal for further processing, which is called Machine/Neural/Artificial attention. Understanding the relation between human and machine attention is important for interpreting and designing neural networks. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidence, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks, diverse representative backbones, and famous architectures, corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we quantify the consistency between artificial attention and human visual attention and offer novel insights into existing artificial attention mechanisms by giving preliminary answers to several key questions related to human and artificial attention mechanisms. Overall results demonstrate that human attention can benchmark the meaningful `ground-truth' in attention-driven tasks, where the more the artificial attention is close to human attention, the better the performance; for higher-level vision tasks, it is case-by-case. It would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attention to boost the performance; such alignment would also improve the network explainability for higher-level computer vision tasks.

Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Aligning Machine and Human Visual Representations across Abstraction Levels

Evaluating (and Improving) the Correspondence Between Deep Neural Networks and Human Representations

Evaluating alignment between humans and neural network representations in image-based learning tasks

Exploring Counterfactual Alignment Loss towards Human-centered AI

Dimensions underlying the representational alignment of deep neural networks with humans

Abstraction Alignment: Comparing Model and Human Conceptual Relationships

Achieving More Human Brain-Like Vision via Human EEG Representational Alignment

Approaching human 3D shape perception with neurally mappable models

Towards Utilising a Range of Neural Activations for Comprehending Representational Associations

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

Concept Alignment

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines

Understanding More about Human and Machine Attention in Deep Neural Networks

Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers

When Does Perceptual Alignment Benefit Vision Representations?

Neural alignment predicts learning outcomes in students taking an introduction to computer science course

Neural Dynamics of Object Manifold Alignment in the Ventral Stream

Human-Like Geometric Abstraction in Large Pre-trained Neural Networks

Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training