Parallel development of object recognition in newborn chicks and deep neural networks
Lalit Pandey,Donsuk Lee,Samantha M. W. Wood,Justin N. Wood
DOI: https://doi.org/10.1371/journal.pcbi.1012600
2024-12-03
PLoS Computational Biology
Abstract:How do newborns learn to see? We propose that visual systems are space-time fitters, meaning visual development can be understood as a blind fitting process (akin to evolution) in which visual systems gradually adapt to the spatiotemporal data distributions in the newborn's environment. To test whether space-time fitting is a viable theory for learning how to see, we performed parallel controlled-rearing experiments on newborn chicks and deep neural networks (DNNs), including CNNs and transformers. First, we raised newborn chicks in impoverished environments containing a single object, then simulated those environments in a video game engine. Second, we recorded first-person images from agents moving through the virtual animal chambers and used those images to train DNNs. Third, we compared the viewpoint-invariant object recognition performance of the chicks and DNNs. When DNNs received the same visual diet (training data) as chicks, the models developed common object recognition skills as chicks. DNNs that used time as a teaching signal—space-time fitters—also showed common patterns of successes and failures across the test viewpoints as chicks. Thus, DNNs can learn object recognition in the same impoverished environments as newborn animals. We argue that space-time fitters can serve as formal scientific models of newborn visual systems, providing image-computable models for studying how newborns learn to see from raw visual experiences. Do machines learn like brains? The performance of all learning systems depends on both the learning machinery and experiences (training data) from which the system learns, so answering this question will require giving machines and brains the same training data. To do so, we introduce a digital twin method for running parallel controlled-rearing studies on newborn animals and deep neural networks. We show that when deep neural networks (CNNs and transformers) are trained in the same visual environments as newborn chicks, the models develop the same object recognition skills as chicks. Both newborn chicks and deep neural networks can learn invariant object representations that generalize across novel viewpoints, even when learning occurs in an impoverished environment containing a single object seen from a limited 60° viewpoint range. Our study shows that blind fitting processes (variation + selection learning) can mimic the rapid visual learning of precocial newborn animals, in the absence of innate (hardcoded) knowledge about objects or space. We argue that visual development can be understood as space-time fitting, in which visual systems gradually adapt to the spatiotemporal data distributions in the environment.
biochemical research methods,mathematical & computational biology