Abstract:High dropout rates in tertiary education expose a lack of efficiency that causes frustration of expectations and financial waste. Predicting students at risk is not enough to avoid student dropout. Usually, an appropriate aid action must be discovered and applied in the proper time for each student. To tackle this sequential decision-making problem, we propose a decision support method to the selection of aid actions for students using offline reinforcement learning to support decision-makers effectively avoid student dropout. Additionally, a discretization of student's state space applying two different clustering methods is evaluated. Our experiments using logged data of real students shows, through off-policy evaluation, that the method should achieve roughly 1.0 to 1.5 times as much cumulative reward as the logged policy. So, it is feasible to help decision-makers apply appropriate aid actions and, possibly, reduce student dropout.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to provide a decision - support system for educational institutions through off - policy reinforcement learning methods, in order to select the most appropriate actions to assist students and thus reduce the dropout rate of students. Specifically, the paper aims to: 1. **Predict and identify students at risk of dropping out**: It is not simply to predict which students are at risk of dropping out, but further to determine for each student when and what kind of assistance measures are most effective. 2. **Optimize the selection and application time of assistance policies**: Through off - line reinforcement learning algorithms, help decision - makers select and implement appropriate assistance measures at the right time to minimize students' dropout behavior. 3. **Evaluate the impact of different clustering methods on state - space discretization**: Use two different clustering algorithms, X - means and OPTICS, to discretize the student state - space and evaluate their impact on the model performance. ### Problem Background The global gross enrollment rate in higher education increased from 19% in 2000 to 38% in 2018, but the expansion of the higher education system does not necessarily lead to an increase in the number of graduates. Especially in economically vulnerable developing countries, such as Latin America, it is crucial to transform the increase in higher education enrollment into the supply of highly - skilled labor. Student dropout not only wastes resources and frustrates expectations, but may also lead to the loss of personal, professional and social potential. Therefore, effective policies and actions are crucial for preventing student dropout. ### Method Overview The paper proposes a decision - support method based on off - line reinforcement learning. The specific steps include: - **Markov Decision Process (MDP) Modeling**: Model the student dropout problem as a fully observable MDP, where the state space \( S \) represents the state of students, the action space \( A \) represents the assistance measures that can be taken, the transition function \( P(s_{t + 1}|s_t,a_t) \) represents the probability of entering the next state after taking an action, and the reward function \( R(s_t,a_t) \) represents the immediate feedback after taking an action. - **Off - line Reinforcement Learning**: Use existing historical data (i.e., off - line data) to train a policy that can recommend optimal assistance measures. Since it is not possible to directly interact with the environment, off - line RL needs to specifically handle the problem of inconsistent data distributions. - **State - Space Discretization**: Discretize the continuous state - space through clustering algorithms (such as X - means and OPTICS) to simplify the problem and improve computational efficiency. - **Off - Policy Evaluation (OPE)**: Use methods such as Sequential Weighted Doubly - Robust (SWDR) and Model Guided Importance Sampling Combining (MAGIC) to evaluate the performance of the newly learned policy. ### Experimental Results The experimental results show that the proposed off - line reinforcement learning method has about 1.0 to 1.5 times higher cumulative rewards than the logging policy, indicating that this method can help decision - makers more effectively select and implement appropriate assistance measures, and thus may reduce the student dropout rate. ### Summary This research provides a new decision - support tool through off - line reinforcement learning methods to help educational institutions better deal with the student dropout problem. By optimizing the selection and application time of assistance policies, this method is expected to increase the graduation rate of students and thus promote the overall progress of society.

Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

DROP: Conservative Model-based Optimization for Offline Reinforcement Learning

Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

Supporting Decision-Making Process on Higher Education Dropout by Analyzing Academic, Socioeconomic, and Equity Factors through Machine Learning and Survival Analysis Methods in the Latin American Context

Achieving optimal trade-off for student dropout prediction with multi-objective reinforcement learning

Identifying At-Risk K-12 Students in Multimodal Online Environments: A Machine Learning Approach

An early warning system to identify and intervene online dropout learners

Predicting Early Dropout: Calibration and Algorithmic Fairness Considerations

An Early Warning System For School Dropout In The State Of Espírito Santo: A Machine Learning Approach With Variable Selection Methods

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

Predicting Students Success in Blended Learning—Evaluating Different Interactions Inside Learning Management Systems

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Efficient Online Reinforcement Learning with Offline Data

Reducing dropout rate through a deep learning model for sustainable education: long-term tracking of learning outcomes of an undergraduate cohort from 2018 to 2021

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Improving Predictive Modeling for At-Risk Student Identification: A Multistage Approach.

Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics

Temporal and Between-Group Variability in College Dropout Prediction

Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems

Predictive modelling of student dropout risk: Practical insights from a South Korean distance university