Abstract:Parallel power loads anomalies are processed by a fast-density peak clustering technique that capitalizes on the hybrid strengths of Canopy and K-means algorithms all within Apache Mahout's distributed machine-learning environment. The study taps into Apache Hadoop's robust tools for data storage and processing, including HDFS and MapReduce, to effectively manage and analyze big data challenges. The preprocessing phase utilizes Canopy clustering to expedite the initial partitioning of data points, which are subsequently refined by K-means to enhance clustering performance. Experimental results confirm that incorporating the Canopy as an initial step markedly reduces the computational effort to process the vast quantity of parallel power load abnormalities. The Canopy clustering approach, enabled by distributed machine learning through Apache Mahout, is utilized as a preprocessing step within the K-means clustering technique. The hybrid algorithm was implemented to minimise the length of time needed to address the massive scale of the detected parallel power load abnormalities. Data vectors are generated based on the time needed, sequential and parallel candidate feature data are obtained, and the data rate is combined. After classifying the time set using the canopy with the K-means algorithm and the vector representation weighted by factors, the clustering impact is assessed using purity, precision, recall, and F value. The results showed that using canopy as a preprocessing step cut the time it proceeds to deal with the significant number of power load abnormalities found in parallel using a fast density peak dataset and the time it proceeds for the k-means algorithm to run. Additionally, tests demonstrate that combining canopy and the K-means algorithm to analyze data performs consistently and dependably on the Hadoop platform and has a clustering result that offers a scalable and effective solution for power system monitoring.

Research on parallel clustering of power load based on improved K- Means algorithm

A Parallel Clustering Algorithm for Power Big Data Analysis.

Research on Clustering Algorithm of Load Decomposition Considering Harmonic Characteristics in Power Safety Monitoring

Parallel power load abnormalities detection using fast density peak clustering with a hybrid canopy-K-means algorithm

A method of power user clustering identification base on optimization of K-means algorithm

Clustering Analysis of Power Load Curves Based on KPCA and Improved K-means

The Parallel Implementation and Application of an Improved K-means Algorithm

Application of an Improved K‐means Clustering Algorithm in Power User Grouping

Hybrid Features Based K-means Clustering Algorithm for Use in Electricity Customer Load Pattern Analysis

Defect Data Mining of Power Consumption Law Based on Improved K-Means Algorithm Clustering

Research on Quantitative Method of Power Network Risk Assessment Based on Improved K-Means Clustering Algorithm

Construction of a smart grid load forecasting platform based on clustering algorithm

An Improved Parallel K-means Algorithm Based on MapReduce

Load modeling based on improved K-means clustering algorithm and its application

Clustering Approach and Characteristic Indices for Load Profiles of Customers Using Data from AMI

Clustering Analysis Method of Power Grid Company Based on K-means

Parallel implementation of K-Means clustering algorithm based on mapReduce computing model of hadoop

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Research on Community User Clustering Based on Multidimensional Power Consumption Features

An Improved Parallel K-means Clustering Algorithm with MapReduce

Research on Efficient K_Means Parallel Algorithm Based on Hadoop Distributed Architecture