Scaling Data-Driven Building Energy Modelling using Large Language Models

Sunil Khadka,Liang Zhang
2024-07-04
Abstract:Building Management System (BMS) through a data-driven method always faces data and model scalability issues. We propose a methodology to tackle the scalability challenges associated with the development of data-driven models for BMS by using Large Language Models (LLMs). LLMs' code generation adaptability can enable broader adoption of BMS by "automating the automation," particularly the data handling and data-driven modeling processes. In this paper, we use LLMs to generate code that processes structured data from BMS and build data-driven models for BMS's specific requirements. This eliminates the need for manual data and model development, reducing the time, effort, and cost associated with this process. Our hypothesis is that LLMs can incorporate domain knowledge about data science and BMS into data processing and modeling, ensuring that the data-driven modeling is automated for specific requirements of different building types and control objectives, which also improves accuracy and scalability. We generate a prompt template following the framework of Machine Learning Operations so that the prompts are designed to systematically generate Python code for data-driven modeling. Our case study indicates that bi-sequential prompting under the prompt template can achieve a high success rate of code generation and code accuracy, and significantly reduce human labor costs.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the scalability challenges faced by data - driven modeling methods in building energy management. Specifically, each building has different characteristics and requires a large amount of data and customized data - driven models, which makes model development difficult. In addition, the development and deployment of these systems usually require a great deal of manual work and expertise. Therefore, the authors propose a method of using large - language models (LLMs) to generate code to automate the data - processing and data - driven - modeling processes, thereby reducing time, effort, and cost and improving the accuracy and scalability of the models. The main contributions of the paper include: 1. **Proposing an LLMs - based automated data - processing and - modeling method**: By using LLMs to generate Python code, process the structured data from building management systems (BMS), and construct data - driven models that meet specific requirements. 2. **Designing Prompt templates**: Following the machine - learning - operations (MLOps) framework, systematically generate Python code for data - driven modeling. 3. **Evaluating three Prompt strategies**: One - shot prompting, step - wise sequential prompting, and bi - sequential prompting, and comparing their performance in code - generation tasks. 4. **Conducting a case study**: In a virtual small - office building, use ChatGPT - 4 to generate code to predict the building's cooling rate and compare it with the results of manual coding. Through these methods, the paper shows how to effectively use LLMs to solve the scalability problem in data - driven building energy management.