Abstract:Logging is a critical function in modern distributed applications, but the lack of standardization in log query languages and formats creates significant challenges. Developers currently must write ad hoc queries in platform-specific languages, requiring expertise in both the query language and application-specific log details -- an impractical expectation given the variety of platforms and volume of logs and applications. While generating these queries with large language models (LLMs) seems intuitive, we show that current LLMs struggle with log-specific query generation due to the lack of exposure to domain-specific knowledge. We propose a novel natural language (NL) interface to address these inconsistencies and aide log query generation, enabling developers to create queries in a target log query language by providing NL inputs. We further introduce ~\textbf{NL2QL}, a manually annotated, real-world dataset of natural language questions paired with corresponding LogQL queries spread across three log formats, to promote the training and evaluation of NL-to-loq query systems. Using NL2QL, we subsequently fine-tune and evaluate several state of the art LLMs, and demonstrate their improved capability to generate accurate LogQL queries. We perform further ablation studies to demonstrate the effect of additional training data, and the transferability across different log formats. In our experiments, we find up to 75\% improvement of finetuned models to generate LogQL queries compared to non finetuned models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the standardization and usability issues of Log Query Languages (LogQL) in modern distributed applications. Specifically, developers face the following challenges when writing log queries on different platforms: 1. **Lack of standardization**: Different log platforms use their own specific log query languages, requiring developers to master multiple query languages, and there is almost no syntactic overlap between each language. This makes developers need to relearn a new query language when switching between different platforms. 2. **Complex log formats**: The formats and structures of log data are complex and diverse, and developers need to have a deep understanding of the specific log format to write accurate query statements. For example, when building a Grafana dashboard, developers must be familiar with tag names, tag values, and the exact log line syntax (such as "sshd[" and ": session opened for"), which places high demands on developers. 3. **Steep learning curve**: Existing log query tools usually have a high learning threshold, and only a few technical experts can fully utilize the functions of these tools. New team members often need to spend a great deal of time getting familiar with these tools. 4. **Lack of context information**: In a DevOps environment, the person who writes the query (usually an operator) and the person who generates the log (usually a developer) are often different people, which results in the operator lacking sufficient context information when writing the query. To address these problems, the authors propose an approach based on Natural Language Processing (NLP) to generate accurate log query statements by fine - tuning Large Language Models (LLMs). Specifically, the authors created a manually - annotated dataset named NL2LogQL, which contains natural language questions and their corresponding LogQL queries. Using this dataset, the authors fine - tuned several state - of - the - art LLMs to improve their ability to generate accurate LogQL queries. Through this method, the authors hope to achieve the following goals: - Provide an intuitive and easy - to - use interface, allowing developers to generate the required LogQL queries through natural language input. - Lower the learning threshold for log queries, enabling more people to easily perform log analysis. - Improve the accuracy and efficiency of queries, reduce syntax errors, and improve tag matching and time aggregation. In summary, this paper aims to solve the standardization and usability problems in log queries by fine - tuning LLMs, thereby improving the efficiency and accessibility of log analysis.

Chatting with Logs: An exploratory study on Finetuning LLMs for LogQL

Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge

LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models

NL2KQL: From Natural Language to Kusto Query

KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding.

nl2spec: Interactively Translating Unstructured Natural Language to Temporal Logics with Large Language Models

KnowLog: Knowledge Enhanced Pretrained Language Model for Log Understanding

LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study

Aligning Large Language Models to a Domain-specific Graph Database for NL2GQL

Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations

Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging

Large language models and unsupervised feature learning: implications for log analysis

LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

An Assessment of ChatGPT on Log Data

Studying and Benchmarking Large Language Models For Log Level Suggestion

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data