Automatic Feature Learning for MOOC Forum Thread Classification

Lin Feng,Huimin Lu,Shenglan Liu,Guochao Liu,Sen Luo
DOI: https://doi.org/10.1145/3220199.3220201
2018-01-01
Abstract:Discussion thread classification plays an important role for Massive Open Online Courses (MOOCs) forum. Most existing methods in this filed focus on extracting text features (e.g. key words) from the content of discussions using NLP methods. However, diversity of languages used in MOOC forums results in poor expansibility of these methods. To tackle this problem, in this paper, we artificially design 23 language independent features related to structure, popularity and underlying social network of thread. Furthermore, a hybrid model which combine Gradient Boosting Decision Tree (GBDT) with Linear Regression (LR) (GBDT + LR) is employed to reduce the traditional cost of feature learning for discussion threads classification manually. Experiments are carried out on the datasets contributed by Coursera with nearly 100, 000 discussion threads of 60 courses taught in 4 different languages. Results demonstrate that our method can significantly improve the performance of discussion threads classification. It is worth drawing that the average AUC of our model is 0.832, outperforming baseline by 15%.
What problem does this paper attempt to address?