Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm

Zhenwei Guo,Xinyu Wang,Liang Ge
DOI: https://doi.org/10.3389/fbuil.2023.1207193
2023-08-01
Frontiers in Built Environment
Abstract:It is increasingly important to create a healthier indoor environment for office buildings. Accurate and reliable prediction of PM 2.5 concentration can effectively alleviate the delay problem of indoor air quality control system. The rapid development of machine learning has provided a research basis for the indoor air quality system to control the PM 2.5 concentration. One approach is to introduce the CatBoost algorithm based on rank lifting training into the classification and prediction of indoor PM 2.5 concentration. Using actual monitoring data from office building, we consider previous indoor PM 2.5 concentration, indoor temperature, relative humidity, CO 2 concentration, and illumination as input variables, with the output indicating whether indoor PM 2.5 concentration exceeds 25 μg/m 3 . Based on the CatBoost algorithm, we construct an intelligent classification prediction model for indoor PM 2.5 concentration. The model is evaluated using actual data and compared with the multilayer perceptron (MLP), gradientboosting decision tree (GBDT), logistic regression (LR), decision tree (DT), and k-nearest neighbors (KNN) models. The CatBoost algorithm demonstrates outstanding predictive performance, achieving an impressive area under the ROC curve (AUC) of 0.949 after hyperparameters optimition. Furthermore, when considering the five input variables, the feature importance is ranked as follows: previous indoor PM 2.5 concentration, relative humidity, CO 2 , indoor temperature, and illuminance. Through verification, the prediction model based on CatBoost algorithm can accurately predict the indoor PM 2.5 concentration level. The model can be used to predict whether the indoor concentration of PM 2.5 exceeds the standard in advance and guide the air quality control system to regulate.
What problem does this paper attempt to address?