Abstract:People have long hoped for a conversational system that can assist in real-life situations, and recent progress on large language models (LLMs) is bringing this idea closer to reality. While LLMs are often impressive in performance, their efficacy in real-world scenarios that demand expert knowledge remains unclear. LLMs are believed to hold the most potential and value in education, especially in the development of Artificial intelligence (AI) based virtual teachers capable of facilitating language learning. Our focus is centered on evaluating the efficacy of LLMs in the realm of education, specifically in the areas of spoken language learning which encompass phonetics, phonology, and second language acquisition. We introduce a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios, including understanding and application of spoken language knowledge. In addition, we investigate the influence of various prompting techniques such as zero- and few-shot method (prepending the question with question-answer exemplars), chain-of-thought (CoT, think step-by-step), in-domain exampler and external tools (Google, Wikipedia). We conducted large-scale evaluation on popular LLMs (20 distinct models) using these methods. We achieved significant performance improvements compared to the zero-shot baseline in the practical questions reasoning (GPT-3.5, 49.1% -> 63.1%; LLaMA2-70B-Chat, 42.2% -> 48.6%). We found that models of different sizes have good understanding of concepts in phonetics, phonology, and second language acquisition, but show limitations in reasoning for real-world problems. Additionally, we also explore preliminary findings on conversational communication.

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model

Measuring Taiwanese Mandarin Language Understanding

YuLan: An Open-source Large Language Model

Using Large Language Model for End-to-End Chinese ASR and NER

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Spoken Language Intelligence of Large Language Models for Language Learning

A Preliminary Study on Deep Learning-based Chinese Text to Taiwanese Speech Synthesis System

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Large Language Model based Situational Dialogues for Second Language Learning

Language Model Can Listen While Speaking

YAYI 2: Multilingual Open-Source Large Language Models

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Xmodel-LM Technical Report

Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs