Mechanisms of human dynamic object recognition revealed by sequential deep neural networks
Dorina de Jong,Lynn K. A. Sörensen,Sander M. Bohté,Heleen A. Slagter,H. Steven Scholte
DOI: https://doi.org/10.1371/journal.pcbi.1011169
2023-06-10
PLoS Computational Biology
Abstract:Humans can quickly recognize objects in a dynamically changing world. This ability is showcased by the fact that observers succeed at recognizing objects in rapidly changing image sequences, at up to 13 ms/image. To date, the mechanisms that govern dynamic object recognition remain poorly understood. Here, we developed deep learning models for dynamic recognition and compared different computational mechanisms, contrasting feedforward and recurrent, single-image and sequential processing as well as different forms of adaptation. We found that only models that integrate images sequentially via lateral recurrence mirrored human performance (N = 36) and were predictive of trial-by-trial responses across image durations (13–80 ms/image). Importantly, models with sequential lateral-recurrent integration also captured how human performance changes as a function of image presentation durations, with models processing images for a few time steps capturing human object recognition at shorter presentation durations and models processing images for more time steps capturing human object recognition at longer presentation durations. Furthermore, augmenting such a recurrent model with adaptation markedly improved dynamic recognition performance and accelerated its representational dynamics, thereby predicting human trial-by-trial responses using fewer processing resources. Together, these findings provide new insights into the mechanisms rendering object recognition so fast and effective in a dynamic visual world. Our visual world is both stable and dynamic: even within a single glance, a scene may change dramatically. Brains thus need to balance integration of information over time to create stable percepts with sensitivity to changes in sensory input, e.g., to rapidly recognize new objects. How do brains and, in particular, visual systems achieve this? Here, we addressed this question by having humans and different neural network models perform the same object recognition task in which sequences of images were shown in rapid or slow succession. We observed that models treating images as a continuous sequence by integrating its processing over time reproduced human performance patterns better than models processing every single image at a time. Furthermore, models equipped with sensory adaptation, a form of stimulus habituation, better recognized objects in faster sequences and more efficiently captured human behaviour. These findings show that lateral recurrence and adaptation jointly enable object recognition across a wide variety of time scales, suggesting a critical role for these mechanisms in dynamic vision.
biochemical research methods,mathematical & computational biology