HouYi: An open-source large language model specially designed for renewable energy and carbon neutrality field

Mingliang Bai,Zhihao Zhou,Ruidong Wang,Yusheng Yang,Zizhen Qin,Yunxiao Chen,Chunjin Mu,Jinfu Liu,Daren Yu

2023-07-31

Abstract:Renewable energy is important for achieving carbon neutrality goal. With the great success of Large Language Models (LLMs) like ChatGPT in automatic content generation, LLMs are playing an increasingly important role. However, there has not been a specially designed LLM for renewable energy. Meanwhile, there has not been any dataset of renewable energy for training LLMs. Therefore, this paper published the first open-source Renewable Energy Academic Paper (REAP) dataset for non-commercial LLM research of renewable energy. REAP dataset is collected through searching the title and abstract of 1,168,970 academic literatures from Web of Science. Based on REAP dataset, HouYi model, the first LLM for renewable energy, is developed through finetuning general LLMs. HouYi demonstrated powerful academic paper paragraph generation ability in renewable energy field. Experiments show that its ability to generate academic papers on renewable energy is comparable to ChatGPT, slightly outperforms Claude, ERNIE Bot and SparkDesk, and significantly outperforms open-source LLaMA-13B model.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the current lack of large - scale language models (LLMs) specifically designed in the fields of renewable energy and carbon neutrality, as well as relevant open - source datasets. Specifically: 1. **Lack of domain - specific LLMs**: Although general large - scale language models like ChatGPT perform well in multiple fields, they are not optimized specifically for the renewable energy and carbon neutrality fields. This means that these models may not be able to provide the most accurate or relevant information when dealing with specific problems in this field. 2. **Lack of open - source datasets**: In the renewable energy field, there are no publicly available datasets for training large - scale language models. This limits the ability of researchers to develop and improve language models for this field. To solve these problems, the paper proposes the following solutions: - **Construct the REAP dataset**: The authors collected the titles and abstracts of 1,168,970 academic papers from the Web of Science database and constructed the first open - source renewable energy academic paper dataset (REAP) for non - commercial large - scale language model research. - **Develop the HouYi model**: Based on the REAP dataset, by fine - tuning general large - scale language models (such as ChatGLM - 6B), the HouYi model was developed, which is the first large - scale language model designed specifically for the renewable energy field. Through these contributions, the paper aims to improve the efficiency of academic writing in the fields of renewable energy and carbon neutrality and promote research and development in this field.

HouYi: An open-source large language model specially designed for renewable energy and carbon neutrality field

Domain-Specific Large Language Model for Renewable Energy and Hydrogen Deployment Strategies

Supporting Energy Policy Research with Large Language Models

Opportunities and Challenges of Applying Large Language Models in Building Energy Efficiency and Decarbonization Studies: An Exploratory Overview

AcademicGPT: Empowering Academic Research

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Exploring the capabilities and limitations of large language models in the electric energy sector

A Survey of LLM Datasets: From Autoregressive Model to AI Chatbot

YuLan: An Open-source Large Language Model

Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

InternLM-Law: An Open Source Chinese Legal Large Language Model

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

AutoLLM-CARD: Towards a Description and Landscape of Large Language Models

PRE: A Peer Review Based Large Language Model Evaluator

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

AAAR-1.0: Assessing AI's Potential to Assist Research

LLMEval: A Preliminary Study on How to Evaluate Large Language Models