Abstract:Deep neural networks have become increasingly successful at solving classic perception problems such as object recognition, semantic segmentation, and scene understanding, often reaching or surpassing human-level accuracy. This success is due in part to the ability of DNNs to learn useful representations of high-dimensional inputs, a problem that humans must also solve. We examine the relationship between the representations learned by these networks and human psychological representations recovered from similarity judgments. We find that deep features learned in service of object classification account for a significant amount of the variance in human similarity judgments for a set of animal images. However, these features do not capture some qualitative distinctions that are a key part of human representations. To remedy this, we develop a method for adapting deep features to align with human similarity judgments, resulting in image representations that can potentially be used to extend the scope of psychological experiments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to understand the relationship between the feature representations learned by deep neural networks (DNNs) and human mental representations, and tries to improve the deep network features to better capture human mental representations. Specifically: 1. **Problem Background**: - Deep neural networks have achieved remarkable success in solving classic perception problems (such as object recognition, semantic segmentation, and scene understanding), sometimes even surpassing human levels. - These networks can learn effective feature representations of high - dimensional inputs, which is similar to the way humans process complex visual information. - However, although deep learning models perform excellently on certain tasks, whether the feature representations they learn are consistent with human mental representations remains an open question. 2. **Research Objectives**: - **Evaluate the Consistency between Deep Features and Human Mental Representations**: By comparing the features extracted by deep neural networks with human judgments of image similarity, evaluate whether these feature representations can explain human mental representations. - **Improve Deep Features to Better Match Human Mental Representations**: Develop a method to adjust deep network features to make them closer to human similarity judgments. 3. **Specific Problems**: - **Initial Evaluation**: The research found that deep features can explain human similarity judgments to a certain extent, but fail to capture some key qualitative differences. - **Improvement Method**: Adjust deep features through linear transformation to make them better predict human similarity judgments. 4. **Experimental Design**: - **Dataset**: Use 120 animal photos as stimulus materials. - **Behavioral Experiment**: Collected 71,400 similarity scores through Amazon Mechanical Turk. - **Feature Extraction**: Use three different pre - trained CNNs (CaffeNet, VGG16, and GoogLeNet) to extract features. - **Performance Evaluation**: Evaluate model performance by calculating the correlation between deep features and human similarity judgments. 5. **Results**: - **Initial Evaluation**: The unadjusted deep features are moderately to highly correlated with human similarity judgments, but fail to capture the key structures of human mental representations. - **Improved Evaluation**: After adjusting the deep features through linear transformation, the model performance is significantly improved, especially the VGG16 model, which can explain 84% of the variance. 6. **Discussion**: - **Effectiveness of the Method**: Through simple linear transformation, deep features can be adjusted to be closer to human mental representations. - **Potential Applications**: This method provides new tools for cognitive science research and can be used to estimate human mental representations from real - sense inputs (such as pixels). - **Future Work**: It is necessary to further verify the generalization ability of this method in different stimuli and tasks. In conclusion, this paper aims to better capture human mental representations by improving deep neural network features, thereby providing new perspectives and tools for cognitive science and artificial intelligence research.

Adapting Deep Network Features to Capture Psychological Representations

Evaluating (and Improving) the Correspondence Between Deep Neural Networks and Human Representations

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Deep learning algorithms reveal a new visual-semantic representation of familiar faces in human perception and memory

A high-throughput approach for the efficient prediction of perceived similarity of natural objects

Predicting and visualizing psychological attributions with a deep neural network

Dimensions underlying the representational alignment of deep neural networks with humans

Capturing human categorization of natural images at scale by combining deep networks and cognitive models

Learning an Adaptation Function to Assess Image Visual Similarities

Using drawings and deep neural networks to characterize the building blocks of human visual similarity

Identifying individual facial expressions by deconstructing a neural network

Aligning Machine and Human Visual Representations across Abstraction Levels

Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments

Learning to see people like people

Enriching ImageNet with Human Similarity Judgments and Psychological Embeddings

Transfer of View-manifold Learning to Similarity Perception of Novel Objects

Using deep neural networks to disentangle visual and semantic information in human perception and memory

Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

Probing neural representations of scene perception in a hippocampally dependent task using artificial neural networks

Evaluating alignment between humans and neural network representations in image-based learning tasks

Parametric Enhancement of PerceptNet: A Human-Inspired Approach for Image Quality Assessment