TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition

Md Mahadi Hasan Nahid,Davood Rafiei

2024-04-16

Abstract:Table reasoning is a challenging task that requires understanding both natural language questions and structured tabular data. Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation, but they often struggle with large tables due to their limited input length. In this paper, we propose TabSQLify, a novel method that leverages text-to-SQL generation to decompose tables into smaller and relevant sub-tables, containing only essential information for answering questions or verifying statements, before performing the reasoning task. In our comprehensive evaluation on four challenging datasets, our approach demonstrates comparable or superior performance compared to prevailing methods reliant on full tables as input. Moreover, our method can reduce the input context length significantly, making it more scalable and efficient for large-scale table reasoning applications. Our method performs remarkably well on the WikiTQ benchmark, achieving an accuracy of 64.7%. Additionally, on the TabFact benchmark, it achieves a high accuracy of 79.5%. These results surpass other LLM-based baseline models on gpt-3.5-turbo (chatgpt). TabSQLify can reduce the table size significantly alleviating the computational load on LLMs when handling large tables without compromising performance.

Computation and Language,Databases,Information Retrieval

What problem does this paper attempt to address?

The paper aims to address the challenge of reasoning with large tabular data in natural language processing tasks. Specifically, while large language models (LLMs) excel in natural language understanding and generation, they face challenges when dealing with large tables due to input length limitations. The paper proposes a new method called TabSQLify, which decomposes large tables by converting text into SQL queries to extract small sub-tables containing only the key information needed to answer questions or verify statements. This method not only improves scalability and efficiency when handling large-scale tabular data but also reduces the length of the input context, making the entire process more efficient. Researchers conducted a comprehensive evaluation on 4 challenging datasets, and the results show that TabSQLify performs comparably or even better than methods relying on full tables as input. Specifically, it achieved an accuracy of 64.7% on the WikiTQ benchmark and 79.5% on the TabFact benchmark, surpassing other LLM-based baseline models. Additionally, TabSQLify significantly reduces table size, alleviating the computational burden on LLMs when processing large tables, without sacrificing performance. The core of this method lies in leveraging the natural language understanding and generation capabilities of LLMs while reducing their burden in table encoding and reasoning.

TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

Rethinking Tabular Data Understanding with Large Language Models

Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

ALTER: Augmentation for Large-Table-Based Reasoning

Large Language Models are few(1)-shot Table Reasoners

Effective Distillation of Table-based Reasoning Ability from LLMs

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

Enhancing Temporal Understanding in LLMs for Semi-structured Tables

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables

MATATA: A weakly-supervised MAthematical Tool-Assisted reasoning for Tabular Applications

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

Lucy: Think and Reason to Solve Text-to-SQL

Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models

TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data