Modeling early phonetic acquisition from child-centered audio data

Marvin Lavechin,Maureen de Seyssel,Marianne Métais,Florian Metze,Abdelrahman Mohamed,Hervé Bredin,Emmanuel Dupoux,Alejandrina Cristia
DOI: https://doi.org/10.1016/j.cognition.2024.105734
IF: 4.011
2024-04-01
Cognition
Abstract:Infants learn their native language(s) at an amazing speed. Before they even talk, their perception adapts to the language(s) they hear. However, the mechanisms responsible for this perceptual attunement and the circumstances in which it takes place remain unclear. This paper presents the first attempt to study perceptual attunement using ecological child-centered audio data. We show that a simple prediction algorithm exhibits perceptual attunement when applied on unrealistic clean audio-book data, but fails to do so when applied on ecologically-valid child-centered data. In the latter scenario, perceptual attunement only emerges when the prediction mechanism is supplemented with inductive biases that force the algorithm to focus exclusively on speech segments while learning speaker-, pitch-, and room-invariant representations. We argue these biases are plausible given previous research on infants and non-human animals. More generally, we show that what our model learns and how it develops through exposure to speech depends exquisitely on the details of the input signal. By doing so, we illustrate the importance of considering ecologically valid input data when modeling language acquisition.
psychology, experimental
What problem does this paper attempt to address?