Abstract:Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges that large - language models (LLMs) encounter when processing structured tabular data. Specifically, the existing retrieval - based LLMs have the following problems when dealing with tabular data: 1. **Diverse data modalities and large tables**: Structured tables contain multiple data types, especially a large amount of or precise numerical data, which will lead to high token usage, thus challenging the model's memory and computing power. 2. **Complex table - relation understanding**: LLMs are mainly optimized for natural - language understanding and have difficulty effectively parsing the complex relationships in tables to perform effective data transformation and answer extraction. 3. **Limited maximum context length**: The context - length limitation of LLMs makes it difficult to handle large - scale tables, especially when dealing with tables containing millions of rows. To solve these problems, the author proposes a framework named OPENTAB, which can handle tabular - reasoning tasks in an open - domain environment. The main goals of OPENTAB are: - **Automatically identify and retrieve relevant tables**: Automatically retrieve tables related to natural - language queries from a large number of table corpora. - **Generate SQL programs**: Efficiently parse the retrieved tables by generating high - quality SQL queries. - **Reason based on intermediate data**: Utilize the intermediate data in the SQL execution results to conduct well - founded reasoning and generate accurate answers. In addition, OPENTAB also introduces the following key strategies to improve performance: - **Generative Reranking & Sequential Reasoning (GRSR)**: By generating SQL queries and re - ranking tables according to query similarity, effectively deal with the hallucination problem of LLMs and improve prediction accuracy. - **Simple - to - complex prompting strategy**: Gradually generate SQL queries from simple to complex, ensuring a wider range of solution exploration and enhancing the robustness of the system. Through these methods, OPENTAB significantly outperforms the baseline methods in both open - domain and closed - domain settings, especially when dealing with large - scale tabular data.

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Rethinking Tabular Data Understanding with Large Language Models

A Survey of Table Reasoning with Large Language Models

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science

Large Language Models are few(1)-shot Table Reasoners

Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

TableLlama: Towards Open Large Generalist Models for Tables

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

Bridging the Gap: Deciphering Tabular Data Using Large Language Model

TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition

Uncovering Limitations of Large Language Models in Information Seeking from Tables

Large Language Model for Table Processing: A Survey

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

ALTER: Augmentation for Large-Table-Based Reasoning

TableRAG: Million-Token Table Understanding with Language Models

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text