Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons

Irina Higgins,Le Chang,Victoria Langston,Demis Hassabis,Christopher Summerfield,Doris Tsao,Matthew Botvinick
DOI: https://doi.org/10.48550/arXiv.2006.14304
2020-06-25
Neurons and Cognition
Abstract:Deep supervised neural networks trained to classify objects have emerged as popular models of computation in the primate ventral stream. These models represent information with a high-dimensional distributed population code, implying that inferotemporal (IT) responses are also too complex to interpret at the single-neuron level. We challenge this view by modelling neural responses to faces in the macaque IT with a deep unsupervised generative model, beta-VAE. Unlike deep classifiers, beta-VAE "disentangles" sensory data into interpretable latent factors, such as gender or hair length. We found a remarkable correspondence between the generative factors discovered by the model and those coded by single IT neurons. Moreover, we were able to reconstruct face images using the signals from just a handful of cells. This suggests that the ventral visual stream may be optimising the disentangling objective, producing a neural code that is low-dimensional and semantically interpretable at the single-unit level.
What problem does this paper attempt to address?