Abstract:Large Language Model (LLM) has demonstrated significant success in a range of natural language processing (NLP) tasks within general domain. The emergence of LLM has introduced innovative methodologies across diverse fields, including the natural sciences. Researchers aim to implement automated, concurrent process driven by LLM to supplant conventional manual, repetitive and labor-intensive work. In the domain of spectral analysis and detection, it is imperative for researchers to autonomously acquire pertinent knowledge across various research objects, which encompasses the spectroscopic techniques and the chemometric methods that are employed in experiments and analysis. Paradoxically, despite the recognition of spectroscopic detection as an effective analytical method, the fundamental process of knowledge retrieval remains both time-intensive and repetitive. In response to this challenge, we first introduced the Spectral Detection and Analysis Based Paper(SDAAP) dataset, which is the first open-source textual knowledge dataset for spectral analysis and detection and contains annotated literature data as well as corresponding knowledge instruction data. Subsequently, we also designed an automated Q\&A framework based on the SDAAP dataset, which can retrieve relevant knowledge and generate high-quality responses by extracting entities in the input as retrieval parameters. It is worth noting that: within this framework, LLM is only used as a tool to provide generalizability, while RAG technique is used to accurately capture the source of the <a class="link-external link-http" href="http://knowledge.This" rel="external noopener nofollow">this http URL</a> approach not only improves the quality of the generated responses, but also ensures the traceability of the knowledge. Experimental results show that our framework generates responses with more reliable expertise compared to the baseline.

What problem does this paper attempt to address?

The paper attempts to address the issue in the field of spectral detection, where researchers need to autonomously acquire relevant knowledge to determine the spectral techniques and chemometric methods used in experiments. This process is both time-consuming and repetitive. Although large language models (LLMs) have shown excellent performance in natural language processing tasks and have been introduced into the natural sciences to alleviate time and labor-intensive work in practical applications, these models often lack expertise in specific domains, especially in specialized fields like spectral detection. Moreover, most existing related datasets are primarily focused on the biological sciences and medical fields, while the spectral analysis field lacks open-source datasets. To address these issues, the authors first introduce the "Spectral Detection and Analysis-based Literature" (SDAAP) dataset, the first open-source textual knowledge dataset for spectral analysis and detection, which includes annotated literature data and related knowledge instruction data. Subsequently, the authors designed an automated question-answering framework based on the SDAAP dataset. This framework can parse entities and question formats in queries, use the parsing results as query parameters to retrieve relevant spectral detection knowledge, and generate high-quality answers. This approach not only improves the quality of generated answers but also ensures the traceability of knowledge, thereby addressing the issues of knowledge insufficiency and unreliability in the application of existing large language models in specialized fields.

A Quick, trustworthy spectral knowledge Q&A system leveraging retrieval-augmented generation on LLM

Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models

ChatSOS: LLM-based knowledge Q&A system for safety engineering

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

ADMUS: A Progressive Question Answering Framework Adaptable to Multiple Knowledge Sources

WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

Know where to go: Make LLM a relevant, responsible, and trustworthy searchers

Heterogeneous Knowledge Grounding for Medical Question Answering with Retrieval Augmented Large Language Model

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering

Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Retrieval-Augmented Generation for Large Language Models: A Survey

Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit