Abstract:Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e.g., textbooks), but also sets up appropriate learning objectives (e.g., course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as "learning to teach with dynamic loss functions" (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models.

A Kernel Loss for Solving the Bellman Equation

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

A Neural Network Model For General Minimax Problem

Bellman Gradient Iteration for Inverse Reinforcement Learning.

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Solving Hidden Monotone Variational Inequalities with Surrogate Losses

Provably Efficient Kernelized Q-Learning

Learning Surrogate Losses

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses

An Experimental Comparison Between Temporal Difference and Residual Gradient with Neural Network Approximation

Toward Efficient Gradient-Based Value Estimation

How to Boost Any Loss Function

Reinforcement Learning with Non-Cumulative Objective

Does DQN Learn?

Convex Q-Learning, Part 1: Deterministic Optimal Control

Learning to Teach with Dynamic Loss Functions

Stochastic Loss Function.

Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions

Robust Losses for Decision-Focused Learning

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation