Abstract:Access to smart meter data is essential to rapid and successful transitions to electrified grids, underpinned by flexibility delivered by low carbon technologies, such as electric vehicles (EV) and heat pumps, and powered by renewable energy. Yet little of this data is available for research and modelling purposes due consumer privacy protections. Whilst many are calling for raw datasets to be unlocked through regulatory changes, we believe this approach will take too long. Synthetic data addresses these challenges directly by overcoming privacy issues. In this paper, we present Faraday, a Variational Auto-encoder (VAE)-based model trained over 300 million smart meter data readings from an energy supplier in the UK, with information such as property type and low carbon technologies (LCTs) ownership. The model produces household-level synthetic load profiles conditioned on these labels, and we compare its outputs against actual substation readings to show how the model can be used for real-world applications by grid modellers interested in modelling energy grids of the future.
What problem does this paper attempt to address?
The main focus of this paper is on how to generate high-quality synthetic smart meter data using machine learning techniques while protecting user privacy, to support the modeling and research of future power systems. Specifically, the paper addresses the following key issues:
1. **Data Access Restrictions**: Due to consumer privacy protection policies, actual smart meter data is difficult to obtain for research and modeling purposes.
2. **Future Energy System Modeling Needs**: With the increasing proportion of renewable energy and the proliferation of low-carbon technologies (such as electric vehicles and heat pumps), the grid faces new challenges, such as peak demand, grid constraints, and supply-demand mismatches. To better understand and plan future energy systems, it is necessary to simulate household electricity usage patterns under different scenarios.
3. **Insufficient Conditional Generation Capability**: Some existing synthetic data generation methods can create smart meter data but lack the ability to conditionally generate data based on the type of low-carbon technology owned by the household.
To address the above issues, the paper proposes the Faraday model, a framework based on Variational Auto-encoder (VAE) and Gaussian Mixture Model (GMM). The model achieves its goals through the following steps:
- Training on a proprietary dataset containing over 300 million smart meter readings from an energy supplier in the UK, which includes information about house types and low-carbon technology ownership.
- Adopting an improved VAE architecture, where Maximum Mean Discrepancy (MMD) loss is used instead of the traditional KL divergence loss to accommodate the non-normal distribution characteristics of smart meter data.
- Applying GMM on the latent space of the trained VAE to capture the distribution of the latent space, thereby generating more diverse samples during the inference phase.
- Supporting conditional sampling, i.e., generating synthetic load curves under specific conditions based on user-provided information (such as whether they own an electric vehicle, property type, etc.).
In this way, the Faraday model can generate highly realistic synthetic smart meter data while protecting user privacy, thereby helping grid planners better understand future grid behavior and providing valuable tools for related research and applications.