Recurrent neural networks as neuro-computational models of human speech recognition

Christian Brodbeck,Thomas Hannagan,James S. Magnuson
DOI: https://doi.org/10.1101/2024.02.20.580731
2024-02-22
Abstract:Human speech recognition transforms a continuous acoustic signal into categorical linguistic units, by aggregating information that is distributed in time. It has been suggested that this kind of information processing may be understood through the computations of a Recurrent Neural Network (RNN) that receives input frame by frame, linearly in time, but builds an incremental representation of this input through a continually evolving internal state. While RNNs can simulate several key observations about human speech and language processing, it is unknown whether RNNs also develop computational dynamics that resemble human . Here we show that the internal dynamics of long short-term memory (LSTM) RNNs, trained to recognize speech from auditory spectrograms, predict human neural population responses to the same stimuli, beyond predictions from auditory features. Variations in the RNN architecture motivated by cognitive principles further improve this predictive power. Moreover, different components of hierarchical RNNs predict separable components of brain responses to speech in an anatomically structured manner, suggesting that RNNs reproduce a hierarchy of speech recognition in the brain. Our results suggest that RNNs provide plausible computational models of the cortical processes supporting human speech recognition.
Neuroscience
What problem does this paper attempt to address?