Abstract:To address the challenge of automating knowledge discovery from a vast volume of literature, in this paper, we introduce a novel framework based on large language models (LLMs) that combines a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo, designed to enhance the automation of knowledge extraction from scientific articles. The POP algorithm utilizes a prioritized breadth-first search (BFS) across a predefined ontology to generate structured prompt templates and action orders, thereby guiding LLMs to discover knowledge in an automatic manner. Additionally, our LLM-Duo employs two specialized LLM agents: an explorer and an evaluator. These two agents work collaboratively and adversarially to enhance the reliability of the discovery and annotation processes. Experiments demonstrate that our method outperforms advanced baselines, enabling more accurate and complete annotations. To validate the effectiveness of our method in real-world scenarios, we employ our method in a case study of speech-language intervention discovery. Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain. We curate these findings into a publicly accessible intervention knowledge base that holds significant potential to benefit the speech-language therapy community.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to address the challenges of automated knowledge discovery from a large number of literatures. Specifically, the researchers proposed a new framework based on large language models (LLMs), combined with the Progressive Ontology Prompting (POP) algorithm and the dual - agent system (LLM - Duo), to improve the ability to automatically extract knowledge from scientific articles. ### Detailed interpretation #### Background and motivation With the publication of millions of research articles every year, the existing amount of scientific knowledge is huge, presenting extremely high challenges and opportunities for researchers to acquire knowledge through advanced analysis tools and interdisciplinary methods. Discovering knowledge from scientific literature enables researchers to keep up with the latest developments in their fields and gain valuable insights, thereby significantly improving the quality of their work. However, in such a vast ocean of data, only a very limited amount of knowledge is collected and organized due to the inefficiency of the manual review process. For example, in the healthcare field, evidence - based interventions refer to practices and treatments that are based on systematic research and proven effective through controlled studies. This emphasizes the importance of using evidence from well - designed and well - implemented research as the basis for medical decision - making. #### Limitations of existing methods Although large language models (LLMs) have shown great potential in automated knowledge discovery, they still face challenges when dealing with a large amount of domain knowledge. In particular, the context window length of LLMs is limited, which restricts the amount of input text that the model can process at one time, potentially leading to incomplete analysis and loss of connections between data points across documents. To address this issue, Retrieval - Augmented Generation (RAG) technology can, by combining a powerful retrieval component and a generation model, allow the system to access a broader range of information beyond the immediate context window of a single model. #### Proposed methods 1. **Progressive Ontology Prompting (POP) algorithm**: - This algorithm utilizes priority - breadth - first - search (BFS) to traverse a predefined ontology graph, generating structured prompt templates and action sequences, thereby guiding LLMs to automatically discover knowledge. - Specifically, the algorithm selects neighbor nodes with a higher out - to - in ratio of out - degree to in - degree by sorting the out - to - in ratio of neighbor nodes for visiting, in order to quickly visit most of the graph. 2. **Dual - agent system (LLM - Duo)**: - This system contains two specialized LLM agents: the explorer and the evaluator. - The explorer is a chatbot based on RAG technology, generating annotation results in a zero - sample setting and arguing with the evaluator to justify its answers. - The evaluator is responsible for evaluating the annotations and providing feedback to assist the explorer in optimizing its annotations. #### Experiments and applications - **Experimental verification**: - The researchers applied this method in the practical scenario of speech - language intervention discovery, identifying 2,421 interventions from 64,177 research articles. - The experimental results show that this method outperforms advanced baseline methods on multiple metrics, including Consistency Rounds, Verbosity Index, Enumeration Quantity, Faithfulness, Accuracy, and Cover. - **Case study**: - Through the case study of speech - language intervention discovery, this method successfully organized the discovered interventions into a publicly available intervention knowledge base, which is of great significance to the speech - language therapy community. ### Main contributions 1. **Problem modeling**: Model the problem of automated knowledge discovery based on LLMs as a prompt design and scheduling problem based on a predefined ontology graph structure. 2. **New algorithm**: Design a new Progressive Ontology Prompting (POP) algorithm that converts knowledge graph ontologies into structured prompts and action sequences to achieve automatic knowledge discovery from literature. 3. **Dual - agent framework**: Propose a new annotation framework that improves the quality of knowledge discovery through the cooperation and competition of two LLM agents, with performance superior to advanced baselines.

Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting

Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs

SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting

Large Language Model-Based Agents for Software Engineering: A Survey

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph

Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting

AutoFlow: Automated Workflow Generation for Large Language Model Agents