Abstract:In the previous decade, dozens of studies involving thousands of children across several research disciplines have made use of a combined daylong audio-recorder and automated algorithmic analysis called the LENAⓇ system, which aims to assess children's language environment. While the system's prevalence in the language acquisition domain is steadily growing, there are only scattered validation efforts on only some of its key characteristics. Here, we assess the LENAⓇ system's accuracy across all of its key measures: speaker classification, Child Vocalization Counts (CVC), Conversational Turn Counts (CTC), and Adult Word Counts (AWC). Our assessment is based on manual annotation of clips that have been randomly or periodically sampled out of daylong recordings, collected from (a) populations similar to the system's original training data (North American English-learning children aged 3-36 months), (b) children learning another dialect of English (UK), and (c) slightly older children growing up in a different linguistic and socio-cultural setting (Tsimane' learners in rural Bolivia). We find reasonably high accuracy in some measures (AWC, CVC), with more problematic levels of performance in others (CTC, precision of male adults and other children). Statistical analyses do not support the view that performance is worse for children who are dissimilar from the LENAⓇ original training set. Whether LENAⓇ results are accurate enough for a given research, educational, or clinical application depends largely on the specifics at hand. We therefore conclude with a set of recommendations to help researchers make this determination for their goals.

Non-Native Children's Automatic Speech Recognition: the INTERSPEECH 2020 Shared Task ALTA Systems.

The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge

Experiments of ASR-based mispronunciation detection for children and adult English learners

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Error-preserving Automatic Speech Recognition of Young English Learners' Language

Data augmentation using prosody and false starts to recognize non-native children's speech

Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children

Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

A thorough evaluation of the Language Environment Analysis (LENA) system

Exploring Native and Non-Native English Child Speech Recognition With Whisper

A Computer-Assisted Tool for Automatically Measuring Non-Native Japanese Oral Proficiency

Evaluation of state-of-the-art ASR Models in Child-Adult Interactions

An open-source voice type classifier for child-centered daylong recordings

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

Automatic recognition of child speech for robotic applications in noisy environments

TLT-school: a Corpus of Non Native Children Speech

Non-native Speaker Verification for Spoken Language Assessment

Accuracy of the language environment analysis (LENA) speech processing system for detecting communicative vocalizations of young children

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Reading Miscue Detection in Primary School through Automatic Speech Recognition

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes