Missing Data Filling Algorithm for Big Data-Based Map-Reduce Technology

Fugui Li,Ashutosh Sharma
DOI: https://doi.org/10.4018/ijec.304036
2022-07-29
International Journal of e-Collaboration
Abstract:In big data, the large number of missing values has a serious problem to compute the correct decision. This problem seriously affects the quality of information query, distorts data mining and analysis, and misleads the decisions. Therefore, in order to solve the missing values in the real database, we have pre populated the missing data, and filled in the classification attributes based on the probabilistic reasoning. The reasoning process is completed in Bayesian network to realize the parallelization of big data processing. The proposed algorithm has been presented in the Map-Reduce framework. The experimental results show that the Bayesian network construction method and probabilistic inference are effective for the classification data processing, and the parallelism of algorithm in Hadoop.
What problem does this paper attempt to address?