Leandro M. de Lima,Renato A. Krohling
Abstract:High dropout rates in tertiary education expose a lack of efficiency that causes frustration of expectations and financial waste. Predicting students at risk is not enough to avoid student dropout. Usually, an appropriate aid action must be discovered and applied in the proper time for each student. To tackle this sequential decision-making problem, we propose a decision support method to the selection of aid actions for students using offline reinforcement learning to support decision-makers effectively avoid student dropout. Additionally, a discretization of student's state space applying two different clustering methods is evaluated. Our experiments using logged data of real students shows, through off-policy evaluation, that the method should achieve roughly 1.0 to 1.5 times as much cumulative reward as the logged policy. So, it is feasible to help decision-makers apply appropriate aid actions and, possibly, reduce student dropout.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide a decision - support system for educational institutions through off - policy reinforcement learning methods, in order to select the most appropriate actions to assist students and thus reduce the dropout rate of students. Specifically, the paper aims to:
1. **Predict and identify students at risk of dropping out**: It is not simply to predict which students are at risk of dropping out, but further to determine for each student when and what kind of assistance measures are most effective.
2. **Optimize the selection and application time of assistance policies**: Through off - line reinforcement learning algorithms, help decision - makers select and implement appropriate assistance measures at the right time to minimize students' dropout behavior.
3. **Evaluate the impact of different clustering methods on state - space discretization**: Use two different clustering algorithms, X - means and OPTICS, to discretize the student state - space and evaluate their impact on the model performance.
### Problem Background
The global gross enrollment rate in higher education increased from 19% in 2000 to 38% in 2018, but the expansion of the higher education system does not necessarily lead to an increase in the number of graduates. Especially in economically vulnerable developing countries, such as Latin America, it is crucial to transform the increase in higher education enrollment into the supply of highly - skilled labor. Student dropout not only wastes resources and frustrates expectations, but may also lead to the loss of personal, professional and social potential. Therefore, effective policies and actions are crucial for preventing student dropout.
### Method Overview
The paper proposes a decision - support method based on off - line reinforcement learning. The specific steps include:
- **Markov Decision Process (MDP) Modeling**: Model the student dropout problem as a fully observable MDP, where the state space \( S \) represents the state of students, the action space \( A \) represents the assistance measures that can be taken, the transition function \( P(s_{t + 1}|s_t,a_t) \) represents the probability of entering the next state after taking an action, and the reward function \( R(s_t,a_t) \) represents the immediate feedback after taking an action.
- **Off - line Reinforcement Learning**: Use existing historical data (i.e., off - line data) to train a policy that can recommend optimal assistance measures. Since it is not possible to directly interact with the environment, off - line RL needs to specifically handle the problem of inconsistent data distributions.
- **State - Space Discretization**: Discretize the continuous state - space through clustering algorithms (such as X - means and OPTICS) to simplify the problem and improve computational efficiency.
- **Off - Policy Evaluation (OPE)**: Use methods such as Sequential Weighted Doubly - Robust (SWDR) and Model Guided Importance Sampling Combining (MAGIC) to evaluate the performance of the newly learned policy.
### Experimental Results
The experimental results show that the proposed off - line reinforcement learning method has about 1.0 to 1.5 times higher cumulative rewards than the logging policy, indicating that this method can help decision - makers more effectively select and implement appropriate assistance measures, and thus may reduce the student dropout rate.
### Summary
This research provides a new decision - support tool through off - line reinforcement learning methods to help educational institutions better deal with the student dropout problem. By optimizing the selection and application time of assistance policies, this method is expected to increase the graduation rate of students and thus promote the overall progress of society.