Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering

Inderjeet Nair,Shwetha Somasundaram,Apoorv Saxena,Koustava Goswami

2023-11-23

Abstract:We address the task of evidence retrieval for long document question answering, which involves locating relevant paragraphs within a document to answer a question. We aim to assess the applicability of large language models (LLMs) in the task of zero-shot long document evidence retrieval, owing to their unprecedented performance across various NLP tasks. However, currently the LLMs can consume limited context lengths as input, thus providing document chunks as inputs might overlook the global context while missing out on capturing the inter-segment dependencies. Moreover, directly feeding the large input sets can incur significant computational costs, particularly when processing the entire document (and potentially incurring monetary expenses with enterprise APIs like OpenAI's GPT variants). To address these challenges, we propose a suite of techniques that exploit the discourse structure commonly found in documents. By utilizing this structure, we create a condensed representation of the document, enabling a more comprehensive understanding and analysis of relationships between different parts. We retain $99.6\%$ of the best zero-shot approach's performance, while processing only $26\%$ of the total tokens used by the best approach in the information seeking evidence retrieval setup. We also show how our approach can be combined with \textit{self-ask} reasoning agent to achieve best zero-shot performance in complex multi-hop question answering, just $\approx 4\%$ short of zero-shot performance using gold evidence.

Computation and Language,Artificial Intelligence,Information Retrieval

What problem does this paper attempt to address?

This paper attempts to address the issue of evidence retrieval in Long Document Question Answering (LDQA). Specifically, it focuses on how to locate relevant paragraphs within long documents to answer specific questions. Since long documents often exceed the maximum input length limitations of existing Pre-trained Language Models (PLMs), and the required information may be scattered across different parts of the document, directly processing the document to extract relevant information becomes challenging. Moreover, processing the entire document to find answers is not only computationally expensive but also inefficient. The paper proposes a set of techniques to leverage the common discourse structure of documents by creating condensed representations of the document, enabling the model to more comprehensively understand the relationships between different parts, thereby overcoming the aforementioned challenges. These techniques not only reduce the total number of tokens that need to be processed but also retain performance close to the best zero-shot methods while significantly reducing computational costs. Additionally, the paper explores how to combine this approach with self-questioning reasoning agents to achieve optimal zero-shot performance in complex multi-hop question answering tasks.

Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering

Investigating Answerability of LLMs for Long-Form Question Answering

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

Zero-Shot Question Answering over Financial Documents using Large Language Models

Leveraging Large Language Models for Multiple Choice Question Answering

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning

Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Attribute or Abstain: Large Language Models as Long Document Assistants

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA

Large Language Models are Strong Zero-Shot Retriever

ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models

From Images to Textual Prompts: Zero-Shot Visual Question Answering with Frozen Large Language Models

Document-Level Machine Translation with Large Language Models