WDIBS: Wasserstein Deterministic Information Bottleneck for State Abstraction to Balance State-Compression and Performance

Zhu Xianchao,Huang Tianyi,Zhang Ruiyuan,Zhu William
DOI: https://doi.org/10.1007/s10489-021-02787-4
IF: 5.3
2021-01-01
Applied Intelligence
Abstract:As an important branch of reinforcement learning, Apprenticeship learning studies how an agent learns good behavioral decisions by observing an expert policy from the environment. It has made many encouraging breakthroughs in real-world applications. State abstraction is typically used to compress the state space of the environment to eliminate redundant information, thereby improving learning efficiency. However, excessive compression results in poor decision performance. Therefore, it is important to balance the compression degree and decision performance. Deterministic Information Bottleneck for State abstraction (DIBS) attempts to solve this problem. Specifically, DIBS uses the information rate to represent the compression degree at first. Then, decision performance after compression is measured using the Kullback-Leibler ( KL ) divergence of distributions between the policy after state compression and the expert policy. However, if the two distributions do not have exactly overlapping support sets, then the KL divergence is usually infinity, which leads to poor decision performance under the low information rate. In this paper, we propose the Wasserstein DIBS (WDIBS) algorithm to optimize the trade-off between the compression degree and decision performance. Specifically, we use the Wasserstein distance to calculate the difference of the distributions between the policy after state compression and the expert policy. Even if the two distributions do not have precisely overlapping support sets, the Wasserstein distance can still reflect their actual difference, thereby ensuring that WDIBS has good decision performance under the low information rate. Theoretical analyses and experiments demonstrate that our method provides a better trade-off between the compression degree and decision performance than DIBS.
What problem does this paper attempt to address?