GSQA: An End-to-End Model for Generative Spoken Question Answering

Min-Han Shih,Ho-Lam Chung,Yu-Chi Pai,Ming-Hao Hsu,Guan-Ting Lin,Shang-Wen Li,Hung-yi Lee
2024-07-22
Abstract:In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at <a class="link-external link-https" href="https://voidful.github.io/GSQA" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the Spoken Question Answering (SQA) task, the existing methods mainly focus on extractive Question Answering (QA), and lack support for abstractive QA. Extractive QA means that the answer directly appears in the input text and the question can be answered by selecting text fragments; while abstractive QA requires reasoning and synthesis from the given information, and the generated answer does not directly exist in the input. To make up for this deficiency, the paper proposes a new end - to - end Generative Spoken Question Answering (GSQA) model, which can handle abstractive QA tasks, thus expanding the capabilities of SQA. Specifically, the main contributions of the paper include: 1. **Introducing the first end - to - end text - free generative spoken question answering model**: The GSQA model can directly generate voice answers from voice input without text. 2. **Establishing a pre - training and fine - tuning method**: By pre - training on the text QA dataset and then fine - tuning on the extractive spoken QA dataset, the model can handle abstractive spoken QA tasks in a zero - sample situation. 3. **Demonstrating the competitiveness of the model in handling extractive and abstractive QA tasks**: The experimental results show that the GSQA model performs better than the previous models on the extractive QA dataset and also performs well on the unseen abstractive spoken QA data. Through these contributions, the paper aims to improve the robustness and generalization ability of the spoken question answering system, making it more widely applicable to various scenarios.