Abstract:In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at <a class="link-external link-https" href="https://voidful.github.io/GSQA" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the Spoken Question Answering (SQA) task, the existing methods mainly focus on extractive Question Answering (QA), and lack support for abstractive QA. Extractive QA means that the answer directly appears in the input text and the question can be answered by selecting text fragments; while abstractive QA requires reasoning and synthesis from the given information, and the generated answer does not directly exist in the input. To make up for this deficiency, the paper proposes a new end - to - end Generative Spoken Question Answering (GSQA) model, which can handle abstractive QA tasks, thus expanding the capabilities of SQA. Specifically, the main contributions of the paper include: 1. **Introducing the first end - to - end text - free generative spoken question answering model**: The GSQA model can directly generate voice answers from voice input without text. 2. **Establishing a pre - training and fine - tuning method**: By pre - training on the text QA dataset and then fine - tuning on the extractive spoken QA dataset, the model can handle abstractive spoken QA tasks in a zero - sample situation. 3. **Demonstrating the competitiveness of the model in handling extractive and abstractive QA tasks**: The experimental results show that the GSQA model performs better than the previous models on the extractive QA dataset and also performs well on the unseen abstractive spoken QA data. Through these contributions, the paper aims to improve the robustness and generalization ability of the spoken question answering system, making it more widely applicable to various scenarios.

GSQA: An End-to-End Model for Generative Spoken Question Answering

GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering

SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

Adapting Pre-trained Generative Models for Extractive Question Answering

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

Discriminative Question Answering Via Cascade Prompt Learning and Sentence Level Attention Mechanism.

Cross-Lingual Transfer Learning for Question Answering

Fluent Response Generation for Conversational Question Answering

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering

Towards Data Distillation for End-to-end Spoken Conversational Question Answering

QuGAN: Quasi Generative Adversarial Network for Tibetan Question Answering Corpus Generation

Joint Learning of Question Answering and Question Generation

SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation

Training Question Answering Models From Synthetic Data

Question Generation via Generative Adversarial Networks

Incorporating External Knowledge into Machine Reading for Generative Question Answering

Addressing Semantic Drift in Generative Question Answering with Auxiliary Extraction.

Unified Question Generation with Continual Lifelong Learning