TSpec-LLM: An Open-source Dataset for LLM Understanding of 3GPP Specifications

Rasoul Nikbakht,Mohamed Benzaghta,Giovanni Geraci

2024-06-04

Abstract:Understanding telecom standards involves sorting through numerous technical documents, such as those produced by the 3rd Generation Partnership Project (3GPP), which is time-consuming and labor-intensive. While large language models (LLMs) can assist with the extensive 3GPP knowledge base, an inclusive dataset is crucial for their effective pre-training and fine-tuning. In this paper, we introduce \textit{TSpec-LLM}, an open-source comprehensive dataset covering all 3GPP documents from Release 8 to Release 19 (1999--2023). To evaluate its efficacy, we first select a representative sample of 3GPP documents, create corresponding technical questions, and assess the baseline performance of various LLMs. We then incorporate a retrieval-augmented generation (RAG) framework to enhance LLM capabilities by retrieving relevant context from the \textit{TSpec-LLM} dataset. Our evaluation shows that using a naive-RAG framework on \textit{TSpec-LLM} improves the accuracy of GPT-3.5, Gemini 1.0 Pro, and GPT-4 from 44\%, 46\%, and 51\% to 71\%, 75\%, and 72\%, respectively.

Networking and Internet Architecture

What problem does this paper attempt to address?

The paper aims to address the challenges in understanding telecommunications standard documents. Specifically, the objectives of the paper include: 1. **Creating a comprehensive dataset**: Introducing an open-source dataset named TSpec-LLM, which covers all versions of 3GPP (Third Generation Partnership Project) technical specification documents from 1999 to 2023. This dataset retains the content of original tables and formulas and includes a complete collection of documents, with a total size of approximately 13.5GB. 2. **Evaluating the performance of large language models (LLMs)**: Creating a technical questionnaire based on 3GPP specifications to assess the performance of current mainstream LLMs (such as GPT-3.5, GPT-4, and Gemini 1.0) in handling complex telecommunications standard issues. 3. **Enhancing LLM capabilities**: Proposing a method based on the Retrieval-Augmented Generation (RAG) framework to improve the ability of LLMs to handle complex domain-specific issues by retrieving relevant contextual information from the TSpec-LLM dataset. 4. **Validating the effectiveness of the method**: The results show that without using RAG, these LLMs have low accuracy (about 44%-51%) in handling complex issues. However, with the use of RAG, accuracy significantly improves to 71%-75%, especially in dealing with issues that require a deep understanding of 3GPP standards. In summary, the paper aims to improve the application effectiveness of large language models in the telecommunications field by creating a comprehensive dataset and adopting RAG technology, particularly in understanding and answering complex questions related to 3GPP standards.

TSpec-LLM: An Open-source Dataset for LLM Understanding of 3GPP Specifications

TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models

Using Large Language Models to Understand Telecom Standards

TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs

Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications

CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge

Understanding Telecom Language Through Large Language Models

Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data

Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

LawBench: Benchmarking Legal Knowledge of Large Language Models

Technical Language Processing for Telecommunications Specifications

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval

TarGEN: Targeted Data Generation with Large Language Models