Abstract:In interactive e-learning environments such as Intelligent Tutoring Systems, pedagogical decisions can be made at different levels of granularity. In this work, we focus on making decisions at two levels: whole problems vs. single steps and explore three types of granularity: problem-level only (Prob-Only), step-level only (Step-Only) and both problem and step levels (Both). More specifically, for Prob-Only, our pedagogical agency decides whether the next problem should be a worked example (WE) or a problem-solving (PS). In WEs, students observe how the tutor solves a problem while in PSs students solve the problem themselves. For Step-Only, the agent decides whether to elicit the student's next solution step or to tell the step directly. Here the student and the tutor co-construct the solution and we refer to this type of task as collaborative problem-solving (CPS). For Both, the agency first decides whether the next problem should be a WE, a PS, or a CPS and based on the problem-level decision, the agent then makes step-level decisions on whether to elicit or tell each step. In a series of classroom studies, we compare the three types of granularity under random yet reasonable pedagogical decisions. Results showed that while Prob-Only may be less effective for High students, Step-Only may be less effective for Low ones, Both can be effective for both High and Low students. Motivated by these findings, we propose and apply an offline, off-policy Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes adaptive, effective decisions at both the problem and step levels. In an empirical classroom study, our results showed that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced step-level policy and a random yet reasonable step-level baseline policy.

PADDLE: Logic Program Guided Policy Reuse in Deep Reinforcement Learning.

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Efficient Deep Reinforcement Learning Through Policy Transfer.

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

CUP: Critic-Guided Policy Reuse

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Relabeling and policy distillation of hierarchical reinforcement learning

Reinforcement Learning Experience Reuse with Policy Residual Representation

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

Effective Interpretable Policy Distillation via Critical Experience Point Identification

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Stochastic Ensemble Policy Transfer

Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Leveraging Granularity: Hierarchical Reinforcement Learning for Pedagogical Policy Induction

Deep Reinforcement Learning with Temporal Logics

LISPR: An Options Framework for Policy Reuse with Reinforcement Learning

Lifetime policy reuse and the importance of task capacity

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

Learning Similar Tasks Based on PPO by Transferring Trajectory.