Abstract:At around 7 months of age, human infants begin to reliably produce well-formed syllables containing both consonants and vowels, a behavior called canonical babbling. Over subsequent months, the frequency of canonical babbling continues to increase. How the infant's nervous system supports the acquisition of this ability is unknown. Here we present a computational model that combines a spiking neural network, reinforcement-modulated spike-timing-dependent plasticity, and a human-like vocal tract to simulate the acquisition of canonical babbling. Like human infants, the model's frequency of canonical babbling gradually increases. The model is rewarded when it produces a sound that is more auditorily salient than sounds it has previously produced. This is consistent with data from human infants indicating that contingent adult responses shape infant behavior and with data from deaf and tracheostomized infants indicating that hearing, including hearing one's own vocalizations, is critical for canonical babbling development. Reward receipt increases the level of dopamine in the neural network. The neural network contains a reservoir with recurrent connections and two motor neuron groups, one agonist and one antagonist, which control the masseter and orbicularis oris muscles, promoting or inhibiting mouth closure. The model learns to increase the number of salient, syllabic sounds it produces by adjusting the base level of muscle activation and increasing their range of activity. Our results support the possibility that through dopamine-modulated spike-timing-dependent plasticity, the motor cortex learns to harness its natural oscillations in activity in order to produce syllabic sounds. It thus suggests that learning to produce rhythmic mouth movements for speech production may be supported by general cortical learning mechanisms. The model makes several testable predictions and has implications for our understanding not only of how syllabic vocalizations develop in infancy but also for our understanding of how they may have evolved.

Learning Model-Based F0 Production Through Goal-Directed Babbling

Decoding the dancing of the tongue: A model-based learning approach to phonetic targets in coarticulationa)

Auditive Learning Based Chinese F0 Prediction

Simulating Articulatory Trajectories with Phonological Feature Interpolation

Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural Plasticity

Modeling F0 Trajectories in Hierarchically Structured Deep Neural Networks.

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis

Unsupervised Inference of Physiologically Meaningful Articulatory Trajectories with VocalTractLab

Emergent Jaw Predominance in Vocal Development through Stochastic Optimization

Investigation of Prosodie FO Layers in Hierarchical FO Modeling for HMM-based Speech Synthesis

Vocal Tract Area Estimation by Gradient Descent

Significant enhancement of room temperature ferromagnetism in surfactant coated polycrystalline Mn doped ZnO particles

A 3D biomechanical vocal tract model to study speech production control: How to take into account the gravity?

Modeling early phonetic acquisition from child-centered audio data

On the Emergence of Phonological Knowledge and on Motor Planning and Motor Programming in a Developmental Model of Speech Production

Modeling speech adaptation to altered sensory feedback through continuous learning of internal sensory predictions

F0 Transformation for Emotional Speech Synthesis Using Target Approximation Features and Bidirectional Associative Memories

EEG-to-F0: Establishing artificial neuro-muscular pathway for kinematics-based fundamental frequency control

Deep Learning for Neuromuscular Control of Vocal Source for Voice Production

Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher