Causal Campbell-Goodhart's law and Reinforcement Learning

Hal Ashton
DOI: https://doi.org/10.48550/arXiv.2011.01010
2021-02-18
Abstract:Campbell-Goodhart's law relates to the causal inference error whereby decision-making agents aim to influence variables which are correlated to their goal objective but do not reliably cause it. This is a well known error in Economics and Political Science but not widely labelled in Artificial Intelligence research. Through a simple example, we show how off-the-shelf deep Reinforcement Learning (RL) algorithms are not necessarily immune to this cognitive error. The off-policy learning method is tricked, whilst the on-policy method is not. The practical implication is that naive application of RL to complex real life problems can result in the same types of policy errors that humans make. Great care should be taken around understanding the causal model that underpins a solution derived from Reinforcement Learning.
Machine Learning,Artificial Intelligence,General Economics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in Reinforcement Learning (RL), how to prevent decision - making agents from choosing sub - optimal strategies due to causal inference errors. Specifically, when a decision - making agent tries to achieve its goal by influencing variables related to the target, there may be a situation where the correlation between these variables and the target does not imply a causal relationship. This phenomenon is known as the Campbell - Goodhart law. ### Introduction to the Campbell - Goodhart law The Campbell - Goodhart law states that when a measure becomes a target, it is no longer a good measure. In other words, problems occur when the optimization process causes the statistical relationship between the measure, which was originally a target proxy, and its actual target to break down. For example, in economics and social sciences, it has been found that when an indicator is used as a policy target, the indicator often loses its original effectiveness. ### Problem description in the paper The author shows through a simple example that off - the - shelf deep reinforcement learning algorithms are not completely immune to this cognitive error. Specifically: - **Off - policy learning methods (such as DQN)**: are easily misled because they will confuse data generated by different strategies during the learning process, resulting in an inability to correctly distinguish causal relationships. - **On - policy learning methods (such as A2C)**: are relatively less likely to be misled because their learning process is based only on the currently used strategy, thus better maintaining the clarity of causal relationships. ### Risks in practical applications If reinforcement learning is simply applied to complex real - world problems, it may lead to policy errors similar to those of humans. Therefore, special attention needs to be paid to understanding the causal models that support reinforcement learning solutions. This not only helps to avoid the selection of sub - optimal strategies but also improves the reliability and effectiveness of RL in practical applications. ### Summary The paper shows, through a toy problem named "the dog barometer problem", the challenges that existing reinforcement learning algorithms may encounter when dealing with problems with a causal structure, and emphasizes the importance of understanding causal relationships for constructing effective and reliable reinforcement learning systems.