Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Chih-Kai Yang,Yu-Kuan Fu,Chen-An Li,Yi-Cheng Lin,Yu-Xiang Lin,Wei-Chih Chen,Ho Lam Chung,Chun-Yi Kuan,Wei-Ping Huang,Ke-Han Lu,Tzu-Quan Lin,Hsiu-Hsuan Wang,En-Pei Hu,Chan-Jan Hsu,Liang-Hsuan Tseng,I-Hsiang Chiu,Ulin Sanga,Xuanjun Chen,Po-chun Hsu,Shu-wen Yang,Hung-yi Lee
2024-11-12
Abstract:This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?