Parallel Fuzzy C-Means Clustering Based Big Data Anonymization Using Hadoop MapReduce

Lawrance, Josephine Usha
DOI: https://doi.org/10.1007/s11277-024-11101-7
IF: 2.017
2024-05-15
Wireless Personal Communications
Abstract:The amount of data on the internet is steadily growing due to recent technological advancements in cyber-physical-social systems, sensor networks, and communication technologies. Many information scientists, policy and decision-makers are attempting to explore this vast amount of data for critical decisions and planned business moves. The increasing amount of big data also increases privacy issues and data breaches. Proper data management is essential for all organizations that handle sensitive information and large volumes of data. Data anonymization is a promising method for protecting individual privacy, resulting in significant information loss. Recently, data anonymization based on data mining techniques has shown significant improvement in data utility. Again, when utilized with big data, the clustering-based anonymization technique has serious scalability issues, and cluster formation on large data sets is time-consuming. This paper proposes the Parallel Fuzzy C-Means Clustering based Anonymization Algorithm (FCMCAA) using the Hadoop MapReduce framework for ensuring the privacy of large volume of structured data. The results demonstrate that the algorithm works better in terms of F-measure and classification accuracy yielding 91% accuracy. It is also scalable and able to handle huge volumes of structured data while maintaining a high level of privacy with minimum information loss.
telecommunications
What problem does this paper attempt to address?