OPTIMIZATION FOR SPARK MISSION PERFORMANCE BASED ON DATA CHARACTERISTICS

Ning Chai,Yijian Wu,Wenyun Zhao
DOI: https://doi.org/10.3969/j.issn.1000-386x.2018.01.009
2018-01-01
Abstract:A new generation of distributed data processing framework greatly enhances the efficiency of data processing tasks.However,it is difficult to find a unified way to optimize the performance of data processing tasks due to the characteristics of different data.In order to exploit memory,computing resources and optimize the efficiency of task execution,we need to analyze the corresponding data characteristics.In this paper,we study the data characteristics of data skew,and propose a quantization method of data inclination.Based on the distributed processing framework Spark,we can automatically judge the data skew of the currently processed data set by combining data sampling analysis and source code semantic analysis,based on the results of the corresponding code to automatically optimize the program,so as to enhance the operational efficiency of the task.Through a number of data processing experiments to verify the efficiency of the method.
What problem does this paper attempt to address?