Neural tracking of natural speech listening in children: temporal response function (TRF) approach
A. O. Rogachev,O. V. Sysoeva
DOI: https://doi.org/10.17816/gc623394
2023-12-15
Abstract:Speech development is crucial for a child’s mental growth. Moreover, speech development significantly impacts a child’s educational and professional achievements. It enables the child to interact with the external environment and develop self-awareness and behavioral skills. Thus, the study of the mechanisms of speech development disorders and the development of diagnostic and remediation strategies is essential.
Numerous cognitive and neurophysiological investigations into speech and its associated disorders among children are presently being conducted. Electroencephalography (EEG) studies demonstrated constant evoked reactions in response to auditory and visual stimuli associated with speech, including individual phonemes and syllables. Moreover, alterations in these reactions were detected among children with diagnosed speech ailments. The debate surrounding the neurophysiological predictors and correlates of specific speech development disorders continues. The use of isolated “ideal” stimuli and multiple repetitions of a single stimulus, as required by the method of evoked potentials, may create peculiarities in experimental techniques. Thus, brain responses to prolonged, “natural” stimuli may differ from those obtained with isolated stimuli. This could potentially reduce the ecological validity of such studies.
In recent years, the temporal response function has become increasingly popular in speech research. This method enables estimating neurophysiological responses to continuous, natural, and ecologically valid stimuli [1–3]. When applied to speech research, this method allows for the study of the brain’s response to changes in acoustic, linguistic, and semantic characteristics present in natural narrative speech [1].
The mathematical basis of the temporal response function (TRF) is the solution of the equation:
w=(STS+λE)–1·STR,
It is calculated from the stimulus characteristics, represented by the matrix S, the neurophysiological signal corresponding to the stimulus, represented by matrix R, and the temporal response function, represented by matrix w, a matrix of linear transformation coefficients from stimulus space to response space [1]. The TRF serves as a “bridge” between the stimulus and the neurophysiological response as it reflects the neural operations that occur between the two. The S and R matrices are matrices with time lags, enabling estimation of the brain’s response to the presented stimulus within a specific time period.
The TRF has been utilized extensively in speech studies [2, 3]. Nevertheless, few studies have used this approach in research that involves children [4, 5]. The use of ecologically valid speech stimuli in child studies simplifies their performance in experimental paradigms and enables the evaluation of brain responses to speech as it occurs in real-life situations, not only in experimentally created conditions. The TRF has various applications to both linguistic and acoustic features of speech, which attracts particular interest in studying the psychophysiological mechanisms of speech development in children with various developmental trajectories. This approach is applied in our study of speech development in children aged 3 to 8 years.
Fifty-six children, consisting of 33 boys and 23 girls aged between 3 and 8 years, participated in this study with a mean age of 5.64 (SD=1.33 years). Participants were required to listen to three audio stories, including a children’s story about hedgehogs and adapted versions of the tales “Brick and Wax” and “The Golden Duck”, all of which were recorded by a female voice. All audio stimuli were accompanied by video to maintain children’s attention. The total duration of the stimuli was 15 minutes. The audio stories were presented using Presentation® software from Neurobehavioral Systems, Inc. in Berkeley, CA. The comprehension of the stories was assessed by asking children 8 “yes/no” questions after each story. Furthermore, on a different day of the study, the Preschool Language Scales Fifth Edition (PLS-5) method was used to examine the child’s current level of receptive and expressive speech development.
A 32-channel EEG was obtained using a Brain Products actiCHamp (Brain Products GmbH, Gilching, Germany) with reference electrodes positioned at the FCz location. EEG pre-processing was completed with the MNE library for Python, which entailed data filtering between 1 and 15 Hz, visually examining record for any noisy channels, interpolation of deficient channels (as needed), removal of oculomotor artifacts using independent component analysis, and re-referencing the EEG recording to an average electrode. The EEG and stimulus were synchronized by labeling at the start of the stimulus. They were subsequently aligned during specific epochs. Processing was carried out with MATLAB (version 2021b) using the mTRF Toolbox [1]. The Toolbox’s functions were employed to assess the speech stimulus envelope, which was then introduced as input to the TRF. The stimulus and EEG sampling rate were reduced to 128 Hz, and the analysis used a time window ranging from –200 to 800 ms. The TRF prediction coefficient, representing the correlation coefficient between actual data and data predicted by the model post-training and cross-validation, was selected for analysis.
The mean value for prediction coefficients across the entire sample was 0.041 (range: –0.002 to 0.106). These coefficients were significantly different from zero (t(55)=13.1, p 0.001). Additionally, a significant positive correlation was found between the prediction coefficients averaged intraindividually across all EEG channels and the age of the participants (r=0.379, p=0.004). The linear model underlying the TRF was able to predict the EEG signal better as the age of the child increased.
A significant positive correlation was observed between the prediction coefficient values and the values on the receptive speech scale of the PLS-5 (r=0.33, p=0.026). In addition, PLS-5 scores were strongly correlated with age (r=0.596, p 0.001).
There was a positive correlation observed between the model prediction coefficient and the scores obtained from the listening comprehension questionnaire (r=0.39, p=0.012). Additionally, the questionnaire scores were found to be significantly associated with scores from the PLS-5 receptive speech scale (r=0.82, p 0.001) as well as with the age of study participants (r=0.51, p=0.001).
Substantively, the predictive coefficient of the temporal response function illustrates the cortical tracking process of the stimulus currently receiving attention and is significantly associated with listening comprehension [2, 3]. Our research indicates a significant and positive correlation between children’s age, their comprehension of speech as measured by the PLS-5 method, and the results of the listening comprehension questionnaire conducted immediately after the experimental task. The prediction coefficient supports this finding. Thus, the use of the temporal response function enables the evaluation of the cerebral cortex’s capacity to follow the acoustic signal of speech in children. Additionally, this approach yields neurophysiological markers of speech reception and comprehension processes. It is feasible to apply an experimental framework to identify neurophysiological correlations of receptive speech across various age groups and participants with varying levels of language and speech skills. The experimental paradigm presented here is a component of research carried out by the Neurobiology of Oral and Written Speech in Developmental Disorders division at the Center for Cognitive Sciences, Sirius University. The authors extend their gratitude to the study participants and project team.