Adaptive Spiking TD3+BC for Offline-to-Online Spiking Reinforcement Learning

Xiangfei Yang,Jian Song,Xuetao Zhang,Donglin Wang
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650965
2024-01-01
Abstract:Spiking reinforcement learning (SRL) has gradually received attention because of its ultra-low energy consumption, but the current SRL algorithms are almost online algorithms that are sample inefficient. It is well known that pure offline spiking reinforcement (offline RL) learning has limited performance. Therefore, we study SRL in the offline-to-online setting, which simultaneously possesses the advantages of low energy consumption and sample efficiency. To the best of our knowledge, this is the first study for offline-to-online SRL. Like offline-to-online RL, offline-to-online SRL also has the policy collapse issue. To overcome this problem, we adaptively adjust the penalty of behavior cloning term of spiking TD3+BC (SpikTD3+BC) based on the adaptability of policy to environment and propose a stable offline-to-online SRL, named AdaSpikTD3+BC. Experimental results on the D4RL benchmark tasks show that AdaSpikTD3+BC can not only avoid policy collapse but also cost about 10% energy to approach the performance of offline-to-online RL based on DNN.
What problem does this paper attempt to address?