Abstract:Current publicly available knowledge work data collections lack diversity, extensive annotations, and contextual information about the users and their documents. These issues hinder objective and comparable data-driven evaluations and optimizations of knowledge work assistance systems. Due to the considerable resources needed to collect such data in real-life settings and the necessity of data censorship, collecting such a dataset appears nearly impossible. For this reason, we propose a configurable, multi-agent knowledge work dataset generator. This system simulates collaborative knowledge work among agents producing Large Language Model-generated documents and accompanying data traces. Additionally, the generator captures all background information, given in its configuration or created during the simulation process, in a knowledge graph. Finally, the resulting dataset can be utilized and shared without privacy or confidentiality concerns. This paper introduces our approach's design and vision and focuses on generating authentic knowledge work documents using Large Language Models. Our study involving human raters who assessed 53% of the generated and 74% of the real documents as realistic demonstrates the potential of our approach. Furthermore, we analyze the authenticity criteria mentioned in the participants' comments and elaborate on potential improvements for identified common issues.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the deficiencies of existing knowledge - work data sets, specifically including the following aspects: 1. **Lack of diversity**: Currently, publicly available knowledge - work data sets lack diversity and cannot comprehensively reflect various situations in actual work scenarios. 2. **Inadequate annotation**: Existing data sets usually lack detailed annotations, which makes it difficult to evaluate and optimize knowledge - work - assistance systems. 3. **Lack of context information**: These data sets often lack background information about users and their documents, resulting in difficulties in conducting objective and comparable data - driven evaluations. 4. **Privacy and confidentiality issues**: Collecting such data in a real - world environment requires a large amount of resources, and in order to protect privacy and confidentiality, the data must be reviewed and deleted, which further affects the integrity and availability of the data. To solve these problems, the authors propose a configurable multi - agent knowledge - work data - set generator (KnoWoGen). This system generates documents produced by large - language models (LLM) and related data traces by simulating the collaborative knowledge work among multiple agents. In addition, the generator captures all background information and stores it in a knowledge graph for subsequent use. The finally generated data set can be shared and used without privacy or confidentiality issues. ### Main methods The main design ideas of KnoWoGen are as follows: - **Multi - agent simulation**: Simulate the process of multiple knowledge workers completing tasks, creating documents, and collaborating. - **Document generation by large - language models**: Generate documents by prompting large - language models to ensure the diversity and authenticity of the documents. - **Knowledge - graph storage of background information**: Store all background information in a knowledge graph to retain all context - related details. - **Configurability**: Engineers can configure the generator according to the requirements of evaluation or optimization tools to create suitable evaluation data sets. ### Experimental verification To verify whether the generated documents are real enough, the authors conducted an experiment and invited human evaluators to score the generated and real documents. The results show that 53% of the generated documents were rated as relatively real to very real, while 74% of the real documents received the same rating. Although there is room for improvement in the authenticity of the generated documents, this result indicates that KnoWoGen has great potential. ### Summary This paper solves the deficiencies of existing knowledge - work data sets by proposing the KnoWoGen system and provides a new solution to generate high - quality, diverse knowledge - work data sets, thus supporting more effective data - driven evaluation and optimization.

Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets

Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets

Large Language Models and Knowledge Graphs: Opportunities and Challenges

What executives need to know about knowledge management, large language models and generative AI

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

Dataset Generation Patterns for Evaluating Knowledge Graph Construction

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning

Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking

Meta Knowledge for Retrieval Augmented Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Improving Open-Domain Dialogue Response Generation with Multi-Source Multilingual Commonsense Knowledge

Open Artificial Knowledge

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

Large Knowledge Model: Perspectives and Challenges

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Combining Knowledge Graphs and Large Language Models

Large Language Models as a Tool for Mining Object Knowledge