Identifying key factors of student academic performance by subgroup discovery

Sumyea Helal,Jiuyong Li,Lin Liu,Esmaeil Ebrahimie,Shane Dawson,Duncan J. Murray
DOI: https://doi.org/10.1007/s41060-018-0141-y
2018-06-21
International Journal of Data Science and Analytics
Abstract:Identifying the factors that influence student academic performance is essential to provide timely and effective support interventions. The data collected during enrolment and after commencement into a course provide an important source of information to assist with identifying potential risk indicators associated with poor academic performance and attrition. Both predictive and descriptive data mining techniques have been applied on educational data to discover the significant reasons behind student performance. These techniques have their own advantages and limitations. For example, predictive techniques tend to maximise accuracy for correctly classifying students, while the descriptive techniques simply search for interesting student features without considering their academic outcome. Subgroup discovery is a data mining method which takes the advantages of both predictive and descriptive approaches. This study uses subgroup discovery to extract significant factors of student performance for a certain outcome (Pass or Fail). In this work, we have utilised student demographic and academic data recorded at enrolment, as well as course assessment and participation data retrieved from the institution’s learning management system (Moodle) to detect the factors affecting student performance. The results have demonstrated the effectiveness of the subgroup discovery method in general in identifying the factors, and the pros and cons of some popular subgroup discovery algorithms used in this research. From the experiments, it has been found that students, who have indigent socio-economic background or been admitted based on special entry requirement, are most likely to fail. The experiments on Moodle data have revealed that students having lower level of access to the course resources and forum have higher possibility of being unsuccessful. From the combined data, we have identified some interesting subgroups which are not detected using enrolment or Moodle data separately. It has been found that those students, who study off-campus or part-time and have a low level of contributions to the course learning activities, are more likely to be the low-performing students.
What problem does this paper attempt to address?