ReTAG: Reasoning Aware Table to Analytic Text Generation

Deepanway Ghosal,Preksha Nema,Aravindan Raghuveer
2023-10-30
Abstract:The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table. While significant progress has been made in table to text generation techniques, models still mostly generate descriptive summaries, which reiterates the information contained within the table in sentences. Through analysis of popular table to text benchmarks (ToTTo (Parikh et al., 2020 and InfoTabs (Gupta et al., 2020) we observe that in order to generate the ideal summary, multiple types of reasoning is needed coupled with access to knowledge beyond the scope of the table. To address this gap, we propose ReTAG, a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output. ReTAG achieves 2.2%, 2.9% improvement on the PARENT metric in the relevant slice of ToTTo and InfoTabs for the table to text generation task over state of the art baselines. Through human evaluation, we observe that output from ReTAG is upto 12% more faithful and analytical compared to a strong table-aware model. To the best of our knowledge, ReTAG is the first model that can controllably use multiple reasoning methods within a structure-aware sequence to sequence model to surpass state of the art performance in multiple table to text tasks. We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "ReTAG: Reasoning Aware Table to Analytic Text Generation" aims to address several key issues in the task of table-to-text generation: 1. **Generating High-Quality Analytic Summaries**: Existing table-to-text generation models primarily produce descriptive summaries, which simply convert information from tables into sentences. However, to generate high-quality analytic summaries, models need to possess various reasoning abilities and be able to combine these reasoning skills to produce more in-depth and meaningful summaries. 2. **Multi-Category Reasoning**: In practical applications, different tables may require different types of reasoning. For example, financial charts often need numerical and temporal reasoning, while biographical information tables require more entity and common-sense knowledge. Therefore, the model needs to dynamically select the appropriate reasoning category based on the type of input table. 3. **Reasoning Control**: In different usage scenarios, the same table can have different summary styles. For instance, for a basketball game score table, experts might want to use temporal and tabular reasoning to summarize interesting patterns, while news articles might focus more on entity knowledge and a concise summary of the game results. Thus, the model needs to have explicit control capabilities during the reasoning process to adapt to different usage scenarios. 4. **Dataset Expansion**: Existing table-to-text datasets (such as ToTTo and InfoTabs) lack annotations for different reasoning categories. To better train and evaluate the model, the paper expands these datasets by adding annotations for the reasoning categories used in each reference sentence. ### Main Contributions 1. **Proposing a New Task**: The paper proposes a new task—Reasoning Aware Table to Text Generation—and validates its necessity in real-world application scenarios. 2. **Introducing the RETAG Model**: The paper introduces the RETAG model, which incorporates different reasoning skills into the output through vector quantization, thereby generating rich analytic summaries. The RETAG model improves the PARENT metric by 2.2% and 2.9% on the ToTTo and InfoTabs datasets, respectively. 3. **Dataset Expansion**: The paper expands the ToTTo and InfoTabs datasets by adding annotations for five popular reasoning categories: numerical, temporal, common-sense, tabular reasoning, and entity knowledge. ### Method Overview 1. **Problem Definition**: The paper defines six reasoning categories (descriptive, tabular, numerical, temporal, common-sense, and entity reasoning) and formalizes the task of reasoning-aware table-to-text generation. 2. **Model Architecture**: The RETAG model includes three modules: an encoder, a codebook based on vector quantization, and a decoder. Using vector quantization technology, the model can generate corresponding summaries based on the specified reasoning category. 3. **Pretraining Strategy**: To generate analytic sentences, the paper proposes a pretraining strategy that uses free-form and structured datasets containing specific reasoning components to enrich the representation of the codebook. 4. **Reasoning Control**: The paper further improves reasoning control by adding classification loss to the intermediate activation layer, enabling the model to better distinguish between analytic and descriptive sentences. ### Experimental Results 1. **Performance Comparison**: Experimental results show that the RETAG model significantly outperforms baseline models on the ToTTo and InfoTabs datasets, especially in generating analytic summaries. 2. **Ablation Study**: The paper validates the impact of the number of codebooks, intermediate activation classification, and pretraining strategy on model performance through ablation studies, demonstrating the importance of these components in improving model performance. In summary, the paper effectively addresses multiple key issues in the table-to-text generation task by proposing the RETAG model and expanding datasets, providing new methods and tools for generating high-quality analytic summaries.