A model of early word acquisition based on realistic-scale audiovisual naming events

Khazar Khorrami,Okko Räsänen
2024-06-08
Abstract:Infants gradually learn to parse continuous speech into words and connect names with objects, yet the mechanisms behind development of early word perception skills remain unknown. We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input. We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that solely learns from statistical regularities in unannotated raw speech and pixel-level visual input. Crucially, the quantity of object naming events was carefully designed to match that accessible to infants of comparable ages. Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants. The findings support the viability of general statistical learning for early word perception, demonstrating how learning can operate without assuming any prior linguistic capabilities.
Audio and Speech Processing,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is how infants, without direct supervision, gradually learn to recognize words from continuous speech and visual input through statistical learning, and associate these words with objects in the external world. Specifically, the researchers simulated the vocabulary learning process of infants under 12 months old, exploring whether effective vocabulary learning can be achieved solely through statistical patterns in unannotated raw speech and pixel-level visual input. This study aims to verify the feasibility and effectiveness of general statistical learning mechanisms in early vocabulary perception, demonstrating that the learning process can occur without assuming any prior language abilities. The key issues of the paper include: 1. **How infants recognize word boundaries from continuous speech streams**: Infants need to learn to segment continuous speech into individual words. 2. **How to associate words with visual objects**: Infants need to learn to correspond the words they hear with the objects they see, despite the significant ambiguity in these correspondences in everyday communication. 3. **The effectiveness of statistical learning mechanisms**: The researchers validated through simulation experiments whether statistical learning mechanisms remain effective under a limited number of naming events, especially within the first year of an infant's life. By exploring these issues, the researchers hope to better understand the cognitive mechanisms of early vocabulary learning in infants, providing new evidence and support for language acquisition theories.