Kubeflow-based Automatic Data Processing Service for Data Center of State Grid Scenario

Chongyou Xu,Guangxian Lv,Jian Du,Lei Chen,Yu Huang,Wang Zhou
DOI: https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00130
2021-01-01
Abstract:With the rapid development of machine learning and deep learning, more and more machine learning and deep learning have appeared in the power grid business. The data processing in the State Grid business is very complicated, the data processing is very cumbersome, and the reuse rate of the data processing code is also very low. In order to solve these problems, this paper proposes an efficient automated data processing service-EADP (Efficient Automated Data Processing). EADP service is built on Kubeflow. Kubeflow/Pipeline is Google's open source workflow for building end-to-end services. Users can build the code as Pipeline/Component for use by Kubeflow/Pipeline. But Kubeflow's Component and Pipeline construction is extremely cumbersome and lacks management of Component and Pipeline. In order to solve these problems, EADP provides the function of automatically constructing Component and data processing DAG. There is a one-to-one correspondence between Component and Docker/Image. Docker/Image contains code blocks for data processing, which can be run after instantiation. The data processing flow can be constructed as a data processing DAG. The data processing DAG is composed of Component, and each node in the DAG corresponds to a Component. EADP uses a topological sorting algorithm to convert the data processing DAG into Kubeflow/Pipeline, thereby realizing automated data processing. On the surface of the experiment, EADP has high stability and convenience, which can greatly shorten the time-consuming data processing.
What problem does this paper attempt to address?