Adapting Deep Network Features to Capture Psychological Representations

Joshua C. Peterson,Joshua T. Abbott,Thomas L. Griffiths
DOI: https://doi.org/10.48550/arXiv.1608.02164
2016-08-07
Abstract:Deep neural networks have become increasingly successful at solving classic perception problems such as object recognition, semantic segmentation, and scene understanding, often reaching or surpassing human-level accuracy. This success is due in part to the ability of DNNs to learn useful representations of high-dimensional inputs, a problem that humans must also solve. We examine the relationship between the representations learned by these networks and human psychological representations recovered from similarity judgments. We find that deep features learned in service of object classification account for a significant amount of the variance in human similarity judgments for a set of animal images. However, these features do not capture some qualitative distinctions that are a key part of human representations. To remedy this, we develop a method for adapting deep features to align with human similarity judgments, resulting in image representations that can potentially be used to extend the scope of psychological experiments.
Computer Vision and Pattern Recognition,Artificial Intelligence,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the relationship between the feature representations learned by deep neural networks (DNNs) and human mental representations, and tries to improve the deep network features to better capture human mental representations. Specifically: 1. **Problem Background**: - Deep neural networks have achieved remarkable success in solving classic perception problems (such as object recognition, semantic segmentation, and scene understanding), sometimes even surpassing human levels. - These networks can learn effective feature representations of high - dimensional inputs, which is similar to the way humans process complex visual information. - However, although deep learning models perform excellently on certain tasks, whether the feature representations they learn are consistent with human mental representations remains an open question. 2. **Research Objectives**: - **Evaluate the Consistency between Deep Features and Human Mental Representations**: By comparing the features extracted by deep neural networks with human judgments of image similarity, evaluate whether these feature representations can explain human mental representations. - **Improve Deep Features to Better Match Human Mental Representations**: Develop a method to adjust deep network features to make them closer to human similarity judgments. 3. **Specific Problems**: - **Initial Evaluation**: The research found that deep features can explain human similarity judgments to a certain extent, but fail to capture some key qualitative differences. - **Improvement Method**: Adjust deep features through linear transformation to make them better predict human similarity judgments. 4. **Experimental Design**: - **Dataset**: Use 120 animal photos as stimulus materials. - **Behavioral Experiment**: Collected 71,400 similarity scores through Amazon Mechanical Turk. - **Feature Extraction**: Use three different pre - trained CNNs (CaffeNet, VGG16, and GoogLeNet) to extract features. - **Performance Evaluation**: Evaluate model performance by calculating the correlation between deep features and human similarity judgments. 5. **Results**: - **Initial Evaluation**: The unadjusted deep features are moderately to highly correlated with human similarity judgments, but fail to capture the key structures of human mental representations. - **Improved Evaluation**: After adjusting the deep features through linear transformation, the model performance is significantly improved, especially the VGG16 model, which can explain 84% of the variance. 6. **Discussion**: - **Effectiveness of the Method**: Through simple linear transformation, deep features can be adjusted to be closer to human mental representations. - **Potential Applications**: This method provides new tools for cognitive science research and can be used to estimate human mental representations from real - sense inputs (such as pixels). - **Future Work**: It is necessary to further verify the generalization ability of this method in different stimuli and tasks. In conclusion, this paper aims to better capture human mental representations by improving deep neural network features, thereby providing new perspectives and tools for cognitive science and artificial intelligence research.