Abstract:Existing digital distribution network models, like those in the databases of network utilities, are known to contain erroneous or untrustworthy information. This can compromise the effectiveness of physics-based engineering simulations and technologies, in particular those that are needed to deliver the energy transition. The large-scale rollout of smart meters presents new opportunities for data-driven system identification in distribution networks, enabling the improvement of existing data sets. Despite the increasing academic attention to system identification for distribution networks, researchers often make troublesome assumptions on what data is available and/or trustworthy. In this paper, we highlight some differences between academic efforts and first-hand industrial experiences, in order to steer the former towards more applicable research solutions.

What problem does this paper attempt to address?

The paper primarily focuses on the quality issues in distribution network datasets, particularly how these datasets affect the effectiveness of physics-based engineering simulations and techniques, which are crucial for the energy transition. The authors point out that despite the large-scale deployment of smart meters bringing new opportunities for data-driven system identification in distribution networks, academic research often makes some unrealistic assumptions about the available data and its reliability. Specifically, the paper discusses the following four issues: 1. **Modeling errors or simplifications**: For example, applying the Kron reduction method in networks where the neutral point is not universally grounded. 2. **Network data errors**: Such as issues with impedance values and topology information. 3. **Inevitable measurement errors**: "Noise" or "bad" data due to sensor tolerances or failures. 4. **Insufficient measurements**: Including semantic mismatches (e.g., average values instead of instantaneous values), granularity mismatches (e.g., three-phase totals instead of phase-separated measurements), and label mismatches (e.g., incorrect location or phase metadata). The paper emphasizes the importance of addressing these issues and offers several recommendations, including understanding, applying, and improving best practices for network dataset development; developing automated data cleaning and maintenance tools; and establishing practical methods to verify the effectiveness of data corrections. Additionally, the paper details specific issues found in actual network data and proposes the concept of a systematic identification/network data cleaning framework aimed at gradually improving the quality of existing distribution network models through a series of calibration tasks. This framework includes steps such as analyzing input data, selecting the most appropriate processing workflows, applying methods, and verifying results. Finally, the paper calls for researchers to consider real-world application scenarios and suggests that future research should focus on addressing underexplored sources of network data errors, integrating methods for handling multiple error sources, understanding the impact of adverse measurement conditions on system identification methods, and integrating real-world validation strategies.

Data quality challenges in existing distribution network datasets

Robust and Automatic Data Cleansing Method for Short-Term Load Forecasting of Distribution Feeders

Distribution Grid Modeling Using Smart Meter Data

Data Quality Management Framework for Smart Grid Systems

Making Distribution State Estimation Practical: Challenges and Opportunities

A set of Non-Synthetic test systems of European LV Rural, LV urban and hybrid MV/LV industrial distribution networks

Ensembles of Realistic Power Distribution Networks

A Data–Physics-Driven Modeling Approach of Key Equipment for Large-Scale Distribution Network Simulation

Combined Unbalanced Distribution System State and Line Impedance Matrix Estimation

Modeling And Simulation Of Dc Distribution Network Based On Distributed Energy

Structure Learning and Statistical Estimation in Distribution Networks - Part I

Identification of Distribution Network Topology and Line Parameter Based on Smart Meter Measurements

Analysis of Data Value in Stochastic Optimal Power Flow for Distribution Systems

Data Quality of Electricity Consumption Data in a Smart Grid Environment

Research on Real-time Automatic Acquisition Technology of Topology Data in Distribution Network

Data-driven Topology and Parameter Identification in Distribution Systems with limited Measurements

Distribution Network Topology Identification Using Smart Meter Data and Considering the Same-Bus-Different-Feeder Condition

On Identification of Distribution Grids

Low Voltage Customer Phase Identification Methods Based on Smart Meter Data

Comprehensive Data-Driven Framework for Detecting and Classifying Non-Technical Distribution Losses

Smart Energy Network Digital Twins: Findings from a UK-Based Demonstrator Project