Struct-X: Enhancing Large Language Models Reasoning with Structured Data

Xiaoyu Tan,Haoyu Wang,Xihe Qiu,Yuan Cheng,Yinghui Xu,Wei Chu,Yuan Qi
2024-07-17
Abstract:Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-reflect-reason'' efficiently enabling LLMs to utilize structured data. It begins by encoding structured data into a topological space using graph embeddings, followed by filling in missing entity information with knowledge retrieval modules, and filtering out irrelevant tokens via a self-supervised module. The final phase involves constructing a topological network with selected tokens to further reduce the total token length for more effective LLM inference. Additionally, Struct-X includes an Auxiliary Module trained to generate prompts, aiding LLMs in analyzing structured data. Extensive experiments on benchmarks, including the knowledge graph question-answer task and the long document reading comprehension task, show that Struct-X notably improves LLM reasoning, demonstrating the effectiveness of structured data augmentation in improving LLM inference with complex input context.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively use structured data to enhance the reasoning ability of large - language models (LLMs). Specifically, existing methods often introduce a large amount of task - irrelevant information when converting structured knowledge graphs (KGs) into text sequences, resulting in a decline in the model's reasoning efficiency and accuracy. In addition, these methods also face challenges in maintaining the global topological structure of the knowledge graph. The paper proposes a new framework **STRUCT - X**, which enables LLMs to use structured data for complex reasoning efficiently through five key stages: "read - model - fill - reflect - reason". ### Main contributions: 1. **Propose a novel framework STRUCT - X**: Realize the "read - model - fill - reflect - reason" process on structured data, enabling LLMs to effectively handle complex structured data. 2. **Design a knowledge learning and filtering process**: Dynamically fill in the gaps in structured knowledge, and use the Self - Retrieval Generation module (Self - Reg) to filter and verify the relevance of the retrieved knowledge, retain valuable token information, and reduce the learning burden of LLMs. 3. **Construct a dedicated graph network encoder**: Fully learn the latent features of associated tokens, achieve efficient cross - layer message passing in Transformer, and design an auxiliary module (Auxiliary Module) that generates coherent prompts to improve the quality of answers. ### Method overview: - **Topological knowledge injection**: Use a graph attention encoder (GAE) to process the input knowledge graph, learn missing knowledge through masking operations, and use a knowledge retrieval module to supplement the complete graph embedding. - **Knowledge and information retrieval**: Filter and verify the retrieved knowledge through the Self - Retrieval Generation module (Self - Reg) to ensure that only the most relevant tokens are retained. - **Graph topological encoder**: Capture the semantic and structural interactions between entities through multi - layer message passing, and compress the embeddings through a trainable dimension - reduction layer to make them more suitable for downstream tasks. - **Auxiliary module**: Generate dynamic prompts to enhance the consistency and coherence of the answers generated by LLMs. ### Experimental results: - **WebQSP**: The accuracy rate of STRUCT - X reaches 75.13%, which is 2.65% higher than that of the previous best method KoPA. - **MetaQA**: The accuracy rate of STRUCT - X is 1.84% higher than that of the state - of - the - art method. - **Family Tree and Travel Route**: The accuracy rates are 3.36% and 5.34% higher than the top - baseline results respectively. ### Ablation study: - **Influence of different functional modules**: Each module has a significant impact on different types of reasoning tasks. For example, in 1 - hop single - fact questions, the complete STRUCT - X model achieves combinatorial reasoning through the multi - head attention mechanism, with an accuracy rate of 91.3%. However, in 2 - hop and 3 - hop multi - step reasoning, the lack of knowledge retrieval and injection modules will lead to a significant decline in performance. - **Filtering and reflection mechanism**: Using the Self - Reg module for learning filtering has a better effect than not filtering or random filtering, indicating the effectiveness of the learning filtering mechanism. In conclusion, this paper solves the deficiencies of existing methods in using structured data to enhance the reasoning ability of LLMs by proposing the STRUCT - X framework, and shows significant improvements in multiple benchmark tests.