ERATTA: Extreme RAG for Table To Answers with Large Language Models

Sohini Roychowdhury,Marko Krema,Anvar Mahammad,Brian Moore,Arijit Mukherjee,Punit Prakashchandra
2024-09-02
Abstract:Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. Although RAG implemented with AI agents (agentic-RAG) has been recently popularized, its suffers from unstable cost and unreliable performances for Enterprise-level data-practices. Most existing use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user-query routing, data-retrieval and custom prompting for question-answering capabilities from Enterprise-data tables. The source tables here are highly fluctuating and large in size and the proposed framework enables structured responses in under 10 seconds per query. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem this paper attempts to address is the cost instability and performance unreliability of existing large language model (LLM)-based and retrieval-augmented generation (RAG) techniques in enterprise-level data practices. Specifically, although methods combining RAG with agents (agentic-RAG) can improve the knowledge quality of retrieved content, these methods face issues such as high costs, time consumption, and difficulty in meeting the needs of a large number of users or groups when dealing with large-scale, highly volatile enterprise data tables. Additionally, existing RAG-LLM methods are either too generic or extremely specific to a particular domain, which raises questions about their scalability and generality. To this end, the paper proposes a unique multi-LLM system framework that leverages multiple large language models to achieve data authentication, user query routing, data retrieval, and custom prompts to support the ability to obtain answers from enterprise data tables. This framework aims to address the following issues: 1. **Improve response speed**: Ensure structured responses are completed within 10 seconds for each query. 2. **Reduce hallucinations**: Propose a five-metric scoring module to detect and report hallucinations in LLM responses. 3. **Enhance scalability and generality**: Improve system scalability and efficiency by decomposing the RAG process into specific tasks (i.e., extreme RAG), thereby reducing maintenance and operational costs. 4. **Support heterogeneous source queries**: Extend the extreme RAG architecture to support heterogeneous source queries using LLMs. Overall, the paper aims to provide faster, more accurate, more reliable, and more cost-effective question-and-answer solutions for enterprise-level data tables through an improved RAG method.