Abstract:Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across publication figures. These results confirm that LLMs are already suitable for use by physical scientists in accelerating their research efforts.

What problem does this paper attempt to address?

The paper primarily explores how to utilize existing methods and technologies to build domain-specific chatbots, particularly for applications in physical science research. The main issues the paper attempts to address are: 1. **Enhancing research efficiency**: By integrating existing technologies, the paper aims to construct a chatbot capable of understanding and answering specialized questions in the field of physical sciences, thereby accelerating the scientific research process. 2. **Addressing the limitations of large language models (LLMs) in scientific applications**: Although current large language models are powerful, their knowledge in specific fields like physical sciences is incomplete, making it difficult to meet the demands for precision and source traceability in scientific research. 3. **Overcoming the hallucination problem**: Large language models sometimes generate information that appears reasonable but is actually incorrect (hallucinations). The paper proposes a method to reduce such issues by providing the model with specific document fragments as contextual information. 4. **Avoiding the need to train new models from scratch**: The paper presents a method that does not require retraining large language models. Instead, it uses text embeddings to retrieve relevant document fragments and provides them as contextual information to the model, thereby achieving domain-specific conversational capabilities. 5. **Integrating image retrieval functionality**: In addition to textual information, the paper also discusses how to use image embedding technology to retrieve image data from scientific publications related to user queries, further enriching the chatbot's functionality. In summary, the paper aims to demonstrate how to quickly build a chatbot that can assist in physical science research using existing technologies and tools, thereby improving the efficiency and quality of researchers' work.

Domain-specific ChatBots for Science using Embeddings

Domain-specific ChatBots for Science using Embeddings

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education

Data science through natural language with ChatGPT's Code Interpreter

Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education

ChatCell: Facilitating Single-Cell Analysis with Natural Language

EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education

A Domain-Specific Next-Generation Large Language Model (LLM) or ChatGPT is Required for Biomedical Engineering and Research

Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education

Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls

Large Language Model‐Based Chatbots in Higher Education

A Complete Survey on LLM-based AI Chatbots

A Platform for the Biomedical Application of Large Language Models

Chatbot-Based Ontology Interaction Using Large Language Models and Domain-Specific Standards

Language Model Powered Digital Biology with BRAD

An LLM-Driven Chatbot in Higher Education for Databases and Information Systems

Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models

PDF CHAT_BOT USING GENERATIVE AI (LLMS&RAG)

ChatGPT Alternative Solutions: Large Language Models Survey