Speech-based Slot Filling using Large Language Models

Guangzhi Sun,Shutong Feng,Dongcheng Jiang,Chao Zhang,Milica Gašić,Philip C. Woodland
2023-11-13
Abstract:Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filling with noisy ASR transcriptions. Moreover, a linearised knowledge injection (LKI) scheme is also proposed to integrate dynamic external knowledge into LLMs. Experiments were performed on SLURP to quantify the performance of LLMs, including GPT-3.5-turbo, GPT-4, LLaMA-13B and Vicuna-13B (v1.1 and v1.5) with different ASR error rates. The use of the proposed fine-tuning together with the LKI scheme for LLaMA-13B achieved an 8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline system on a limited data setup.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the performance of large - language models (LLMs) in handling slot - filling tasks with noisy automatic speech recognition (ASR) transcripts. Specifically, the research focuses on the following points: 1. **Evaluating LLM performance under different ASR error rates**: By using different ASR models (such as different versions of the Whisper model), the researchers evaluated the slot - filling performance of LLMs when processing ASR - transcribed audio with different error rates. 2. **Proposing improved prompt design and fine - tuning methods**: In order to improve the robustness of LLMs when handling noisy ASR transcripts, the paper proposes special prompt design and effective data - efficient fine - tuning methods. These methods aim to use external dynamic knowledge to guide the generation process of LLMs and reduce inaccurate information extraction due to ASR errors. 3. **Introducing the Linearized Knowledge Injection (LKI) scheme**: The LKI scheme allows the contextual knowledge extracted from the N - best list to be linearized into text and provided as part of the prompt to LLMs, in order to provide necessary constraints to guide language generation, especially when handling noisy ASR transcripts. 4. **Exploring slot - filling tasks with limited data sets**: The paper also explores how to improve the performance of slot - filling tasks through transfer learning and pre - trained language models under limited labeled data sets, especially how to effectively use a small amount of data to achieve the best results when fine - tuning in specific domains. In summary, the main objective of this paper is to enhance the slot - filling ability of LLMs when processing noisy ASR - transcribed audio through a series of technical means, such as improved prompt design, the LKI scheme, and the Low - Rank Adaptation (LoRA) fine - tuning method, so as to achieve more accurate and efficient natural - language understanding in practical applications.