Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation

Christoph Klemenjak,Stephen Makonin,Wilfried Elmenreich
DOI: https://doi.org/10.48550/arXiv.2001.07708
2020-01-20
Abstract:Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that provide insights into the energy consumption of households and industrial facilities. Latest contributions show significant improvements in terms of accuracy and generalisation abilities. Despite all progress made concerning disaggregation techniques, performance evaluation and comparability remains an open research question. The lack of standardisation and consensus on evaluation procedures makes reproducibility and comparability extremely difficult. In this paper, we draw attention to comparability in NILM with a focus on highlighting the considerable differences amongst common energy datasets used to test the performance of algorithms. We divide discussion on comparability into data aspects, performance metrics, and give a close view on evaluation processes. Detailed information on pre-processing as well as data cleaning methods, the importance of unified performance reporting, and the need for complexity measures in load disaggregation are found to be the most urgent issues in NILM-related research. In addition, our evaluation suggests that datasets should be chosen carefully. We conclude by formulating suggestions for future work to enhance comparability.
Signal Processing,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the comparability problem in non - invasive load monitoring (NILM) research. Specifically, although significant progress has been made in load disaggregation techniques, there are still open research problems in performance evaluation and result comparability. The paper points out that the lack of standardized and consensual evaluation procedures makes it extremely difficult to reproduce results and compare different studies. To improve this situation, the paper focuses on the significant differences between common energy data sets and discusses the comparability problem from three aspects: data set characteristics, performance indicators, and evaluation processes. The main contributions of the paper include: 1. **Data set analysis**: It compares in detail the characteristics of several commonly used low - sampling - rate energy data sets, such as duration, sampling rate, and the number of installed sub - meters, and points out the impact of these characteristics on the performance of load disaggregation algorithms. 2. **Performance indicators**: It emphasizes the importance of choosing appropriate performance indicators and suggests using normalized indicators to compare electrical appliances of different power levels. 3. **Evaluation process**: It proposes two new evaluation indicators - test set ratio (TSR) and event ratio (EVR) - to quantify the amount of data and the number of events used in the evaluation process, thereby improving the comparability of evaluation results. 4. **Noise level**: It suggests evaluating the noise level in the aggregated power signal and introduces the noise - to - aggregation ratio (NAR) as a measurement standard. Through these methods, the paper aims to provide suggestions for improving comparability in future NILM research, especially when using closed - source data sets or when privacy issues are involved.