Abstract:In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. Meta-learning offers a promising avenue for automatic task discovery; however, these methods are computationally expensive and challenging to tune in practice. In this paper, we explore a complementary approach to the auxiliary task discovery: continually generating new auxiliary tasks and preserving only those with high utility. We also introduce a new measure of auxiliary tasks' usefulness based on how useful the features induced by them are for the main task. Our discovery algorithm significantly outperforms random tasks and learning without auxiliary tasks across a suite of environments.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to autonomously discover useful auxiliary tasks in reinforcement learning. Specifically, the authors explored a generate - and - test - based method to automatically discover auxiliary tasks. Traditionally, these auxiliary tasks are usually designed manually, but this method has limitations because it is difficult to know in advance which tasks will be useful, and poorly - designed tasks may significantly slow down the learning process.
### Core Problems of the Paper
1. **Importance of Auxiliary Tasks**:
- Auxiliary tasks can improve data efficiency and produce better representations by forcing the agent to learn additional prediction and control objectives, thus helping the agent learn the main task more quickly.
- Traditional methods rely on manual design of these tasks, which is time - consuming and error - prone.
2. **Limitations of Existing Methods**:
- Meta - learning, although providing a way to automatically discover tasks, has high computational costs and is difficult to tune parameters.
- Randomly generated auxiliary tasks can sometimes help avoid representation collapse and improve performance, but may also cause significant interference and thus reduce performance.
3. **Solutions Proposed in the Paper**:
- A new auxiliary task discovery method based on the generate - and - test mechanism is proposed. This method continuously generates new auxiliary tasks and retains those that are useful for the main task.
- A new method for measuring the usefulness of auxiliary tasks is introduced, that is, evaluating the contribution of the features induced by these tasks to the main task.
### Method Overview
1. **Generator**:
- Generate new auxiliary tasks. In the experiment, these tasks are defined as general value functions (GVFs) for sub - goal reaching, where the sub - goals are randomly selected from the observation space.
2. **Tester**:
- Evaluate the usefulness of each auxiliary task. Specifically, it measures its usefulness by evaluating the contribution of the features induced by the auxiliary task to the action - value function of the main task.
- Use a learning strategy called "Master - User strategy" to ensure that each feature is modified only through the gradient back - propagation of one task, thus clearly indicating which auxiliary task affects which feature.
3. **Replacement Mechanism**:
- Regularly replace auxiliary tasks considered useless. To prevent newly generated tasks from being replaced too early, an "age" parameter is introduced, and a task will be replaced only when its age exceeds a certain threshold.
### Experimental Results
The paper conducted experiments in multiple environments, including four - rooms, maze, and pinball environments. The results show that:
- The generate - and - test method is significantly better than the baseline method without auxiliary tasks, and in some cases approaches or exceeds the effect of manually - designed auxiliary tasks.
- Fixed random auxiliary tasks also show a certain performance improvement, especially in the grid - world environment, which is consistent with the findings in the literature.
- The generate - and - test method can automatically discover and retain useful auxiliary tasks, rather than relying solely on randomness.
In general, the paper proposes an effective method to automatically discover and optimize auxiliary tasks, thereby improving the performance of reinforcement learning systems.