Abstract:With the widespread use of smart grids and the Internet of Things, the amount of electricity generated by energy metering equipment is gradually increased. The traditional model has been more difficult to burden the storage, processing and analysis of massive electricity consumption information, and has gradually become the performance bottleneck of the system. Based on Hadoop with the advantages of distributed storage and parallel computing, the current performance bottleneck can be solved. In order to improve the fault tolerance of Hadoop in the production environment, the QJM (Quorum Journal Manager) mode is adopted as the high-availability shared storage mechanism, and the ZooKeeper cluster is introduced to complete the switching between the active and standby nodes to achieve smooth failover and avoidance. In order to fully analyze and explore the potential value in massive electricity consumption information, Hive data warehouse is used to realize complex query and multi-dimensional analysis. Based on statistical analysis data, a gray model is used to predict the trend of some data in future electricity usage information. Through the real-time collection of massive electricity consumption information, data cleaning and classification preprocessing, HDFS cloud storage, MapReduce parallel computing, HiveQL statistical analysis, the proposed scheme is efficient and feasible. It also verifies that the gray model can better predict the future power consumption trend under the Hadoop HA (High Available) architecture.

A two steps method of resources utilization predication for large Hadoop data center

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud.

Smart-DRS: A Strategy of Dynamic Resource Scheduling in Cloud Data Center

The performance of MapReduce: an in-depth study

The Performance of MapReduce

Optimizing Resource Allocation for Data-Parallel Jobs Via GCN-Based Prediction

Predictive Data and Energy Management under Budget

Analysis and Prediction of Massive Electricity Information Based on Hadoop Ha Architecture

Efficient resource utilization using multi-step-ahead workload prediction technique in cloud

Big Data Quality Prediction in the Process Industry: A Distributed Parallel Modeling Framework

Power Big Data Analysis Platform Design Based on Hadoop

Scalability Analysis and Improvement of Hadoop Virtual Cluster with Cost Consideration

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Prediction Method of Energy Consumption Based on Multiple Energy-Related Features in Data Center

Energy efficient job scheduling with workload prediction on cloud data center

Performance prediction of parallel computing models to analyze cloud-based big data applications

Performance optimization of computing task scheduling based on the Hadoop big data platform

InSTechEM: An Internet of Thing big data–oriented extended model for MapReduce performance prediction in multiple edge clouds

On Global Resource Allocation in Clusters for Data Analytics.

A Resource Co-Allocation Method for Load-Balance Scheduling over Big Data Platforms