Semi-Supervised Feature Selection of Educational Data Mining for Student Performance Analysis

Shanshan Yu,Yiran Cai,Baicheng Pan,Man-Fai Leung

DOI: https://doi.org/10.3390/electronics13030659

IF: 2.9

2024-02-06

Electronics

Abstract:In recent years, the informatization of the educational system has caused a substantial increase in educational data. Educational data mining can assist in identifying the factors influencing students' performance. However, two challenges have arisen in the field of educational data mining: (1) How to handle the abundance of unlabeled data? (2) How to identify the most crucial characteristics that impact student performance? In this paper, a semi-supervised feature selection framework is proposed to analyze the factors influencing student performance. The proposed method is semi-supervised, enabling the processing of a considerable amount of unlabeled data with only a few labeled instances. Additionally, by solving a feature selection matrix, the weights of each feature can be determined, to rank their importance. Furthermore, various commonly used classifiers are employed to assess the performance of the proposed feature selection method. Extensive experiments demonstrate the superiority of the proposed semi-supervised feature selection approach. The experiments indicate that behavioral characteristics are significant for student performance, and the proposed method outperforms the state-of-the-art feature selection methods by approximately 3.9% when extracting the most important feature.

engineering, electrical & electronic,computer science, information systems,physics, applied

What problem does this paper attempt to address?

The paper proposes a solution to two key issues in Educational Data Mining (EDM): 1. **How to handle a large amount of unlabeled data?** In educational data, there is usually a large amount of unlabeled data (such as students' classroom performance, discussion records, etc.). This data contains rich information but lacks corresponding labels. The method proposed in the paper can handle a large amount of unlabeled data using a small amount of labeled data. 2. **How to identify key features that affect student performance?** The paper points out that educational datasets often contain many irrelevant features, which may affect the accuracy of the model. Therefore, determining which features are crucial to students' academic performance is an important task. This paper proposes a semi-supervised feature selection method aimed at identifying the features that have the greatest impact on students' learning outcomes. To address the above issues, the paper proposes a method called SFSGLR (Semi-Supervised Feature Selection based on Generalized Linear Regression). Specifically, this method combines the idea of semi-supervised learning and can handle a large amount of unlabeled data with only a small amount of labeled data. By solving the feature selection matrix, the importance of each feature can be determined and ranked. Additionally, the paper uses various commonly used classifiers to evaluate the effectiveness of the proposed feature selection method. Experimental results show that this method performs excellently in identifying key features, especially in extracting the most important features, improving performance by approximately 3.9% compared to existing state-of-the-art feature selection methods. The study also found that behavioral features are particularly important for student performance, providing valuable insights for educators and policymakers to develop targeted teaching strategies and interventions based on these features.

Semi-Supervised Feature Selection of Educational Data Mining for Student Performance Analysis

Enhancing College Student Education and Management through Semisupervised Learning

Performance assessment and fitness analysis of athletes using decision tree and data mining techniques

A Method for Prediction and Analysis of Student Performance That Combines Multi-Dimensional Features of Time and Space

Enhancing Student Performance Prediction via Educational Data Mining on Academic data

Characterizing Students' Learning Behaviors Using Unsupervised Learning Methods

Genetic Algorithm Based Feature Selection With Ensemble Methods For Student Academic Performance Prediction

Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis

Discriminable Multi-Label Attribute Selection for Pre-Course Student Performance Prediction

An Effective Learning Management System for Revealing Student Performance Attributes

E-Learning Performance Prediction: Mining the Feature Space of Effective Learning Behavior

Improving Predictive Modeling for At-Risk Student Identification: A Multistage Approach.

A Study on Feature Selection Techniques in Educational Data Mining

Research on Data Mining Combination Model Analysis and Performance Prediction Based on Students’ Behavior Characteristics

Predicting Academic Performance for College Students: A Campus Behavior Perspective

Data Mining Algorithm for College Students’ Mental Health Questionnaire Based on Semisupervised Deep Learning Method

Mining and Application of Digital Health Elements in Higher Education Student Management and Education

A University Student Performance Prediction Model and Experiment Based on Multi-Feature Fusion and Attention Mechanism

The Exploration of Modelling for The Student Achievement Predictor

Investigation and the development of learning analytics dashboard in open and distance learning using big data mining

Research on Educational Data Mining for Online Intelligent Learning