Abstract:Parallel power loads anomalies are processed by a fast-density peak clustering technique that capitalizes on the hybrid strengths of Canopy and K-means algorithms all within Apache Mahout's distributed machine-learning environment. The study taps into Apache Hadoop's robust tools for data storage and processing, including HDFS and MapReduce, to effectively manage and analyze big data challenges. The preprocessing phase utilizes Canopy clustering to expedite the initial partitioning of data points, which are subsequently refined by K-means to enhance clustering performance. Experimental results confirm that incorporating the Canopy as an initial step markedly reduces the computational effort to process the vast quantity of parallel power load abnormalities. The Canopy clustering approach, enabled by distributed machine learning through Apache Mahout, is utilized as a preprocessing step within the K-means clustering technique. The hybrid algorithm was implemented to minimise the length of time needed to address the massive scale of the detected parallel power load abnormalities. Data vectors are generated based on the time needed, sequential and parallel candidate feature data are obtained, and the data rate is combined. After classifying the time set using the canopy with the K-means algorithm and the vector representation weighted by factors, the clustering impact is assessed using purity, precision, recall, and F value. The results showed that using canopy as a preprocessing step cut the time it proceeds to deal with the significant number of power load abnormalities found in parallel using a fast density peak dataset and the time it proceeds for the k-means algorithm to run. Additionally, tests demonstrate that combining canopy and the K-means algorithm to analyze data performs consistently and dependably on the Hadoop platform and has a clustering result that offers a scalable and effective solution for power system monitoring.

A DP Canopy K-Means Algorithm for Privacy Preservation of Hadoop Platform.

Privacy Preserving Distributed DBSCAN Clustering

Distributed Privacy-Aware Fast Selection Algorithm for Large-Scale Data.

UPA: an Automated, Accurate and Efficient Differentially Private Big-Data Mining System

GAPBAS: Genetic Algorithm-based Privacy Budget Allocation Strategy in Differential Privacy K-Means Clustering Algorithm

Privacy-Preserving and Outsourced Multi-User k-Means Clustering

k-Means SubClustering: A Differentially Private Algorithm with Improved Clustering Quality

PPA-DBSCAN: Privacy-preserving ρ-Approximate Density-based Clustering

Improved K-means algorithm based on density Canopy

Differentially Private k-Means Clustering with Guaranteed Convergence

Distributed K-Means Clustering Guaranteeing Local Differential Privacy

Differentially private k-center problems

PPHOPCM Privacy-Preserving High-order Possibilistic C-Means Algorithm for Big Data Clustering with Cloud Computing

Privacy-Preserving Accelerated Clustering for Data Encrypted by Different Keys.

Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

KD3 Scheme for Privacy Preserving Data Mining

Achieving data utility-privacy tradeoff in Internet of Medical Things: A machine learning approach

Parallel power load abnormalities detection using fast density peak clustering with a hybrid canopy-K-means algorithm

Research and implementation of user clustering based on MapReduce in multimedia big data

Privacy-Preserving Machine Learning Algorithms for Big Data Systems

FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy