Data Cleaning About Student Information Based on Massive Open Online Course System.

Shengjun Yin,Yaling Yi,Hongzhi Wang
DOI: https://doi.org/10.1007/978-981-15-7981-3_3
2020-01-01
Abstract:Recently, Massive Open Online Courses (MOOCs) is a major way of online learning for millions of people around the world, which generates a large amount of data in the meantime. However, due to errors produced from collecting, system, and so on, these data have various inconsistencies and missing values. In order to support accurate analysis, this paper studies the data cleaning technology for online open curriculum system, including missing value-time filling for time series, and rule-based input error correction. The data cleaning algorithm designed in this paper is divided into six parts: pre-processing, missing data processing, format and content error processing, logical error processing, irrelevant data processing and correlation analysis. This paper designs and implements missing-value-filling algorithm based on time series in the missing data processing part. According to the large number of descriptive variables existing in the format and content error processing module, it proposed one-based and separability-based criteria Hot+J3+PCA. The online course data cleaning algorithm was analyzed in detail on algorithm design, implementation and testing. After a lot of rigorous testing, the function of each module performs normally, and the cleaning performance of the algorithm is of expectation.
What problem does this paper attempt to address?