Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles

Yihao Zhang,Zhaojie Chai,George Lykotrafitis
DOI: https://doi.org/10.1016/j.physa.2021.125845
2020-12-01
Abstract:A very successful model for simulating emergency evacuation is the social-force model. At the heart of the model is the self-driven force that is applied to an agent and is directed towards the exit. However, it is not clear if the application of this force results in optimal evacuation, especially in complex environments with obstacles. Here, we develop a deep reinforcement learning algorithm in association with the social force model to train agents to find the fastest evacuation path. During training, we penalize every step of an agent in the room and give zero reward at the exit. We adopt the Dyna-Q learning approach. We first show that in the case of a room without obstacles the resulting self-driven force points directly towards the exit as in the social force model and that the median exit time intervals calculated using the two methods are not significantly different. Then, we investigate evacuation of a room with one obstacle and one exit. We show that our method produces similar results with the social force model when the obstacle is convex. However, in the case of concave obstacles, which sometimes can act as traps for agents governed purely by the social force model and prohibit complete room evacuation, our approach is clearly advantageous since it derives a policy that results in object avoidance and complete room evacuation without additional assumptions. We also study evacuation of a room with multiple exits. We show that agents are able to evacuate efficiently from the nearest exit through a shared network trained for a single agent. Finally, we test the robustness of the Dyna-Q learning approach in a complex environment with multiple exits and obstacles. Overall, we show that our model can efficiently simulate emergency evacuation in complex environments with multiple room exits and obstacles where it is difficult to obtain an intuitive rule for fast evacuation.
Machine Learning,Computational Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve efficient emergency evacuation in complex environments containing obstacles. Specifically, the authors developed a method that combines deep reinforcement learning (DRL) with the Social Force Model (SFM) to train agents to find the fastest evacuation paths. The method proposed in the paper aims to overcome the limitations of the traditional Social Force Model in complex environments, especially in the presence of concave - shaped obstacles, which sometimes become traps for agents and prevent the complete evacuation of the room. ### Main Problems 1. **Optimizing Evacuation Paths**: How can agents find the fastest evacuation paths through deep reinforcement learning algorithms in complex environments with obstacles? 2. **Avoiding Traps**: How can agents avoid getting trapped in the presence of concave - shaped obstacles and ensure that all agents can be successfully evacuated? 3. **Multi - Exit Evacuation**: How can agents efficiently select the nearest exit for evacuation in the case of multiple exits? 4. **Robustness of the Model**: How can the robustness and effectiveness of the proposed deep reinforcement learning method be verified in complex multi - exit and multi - obstacle environments? ### Solutions - **Deep Reinforcement Learning (DRL)**: Use the Dyna - Q learning algorithm, which combines model - free direct reinforcement learning (Q - learning) and model - based planning, and train agents through experience replay and target network update. - **Social Force Model (SFM)**: On the basis of DRL, introduce the Social Force Model to describe the interactions between agents and the physical forces in the environment, such as self - driving force, obstacle - avoidance force, compressive force, frictional force, and viscous damping force. - **Multi - Agent System**: Achieve efficient evacuation of the multi - agent system by single - agent training and transferring the learned strategies to other agents. ### Key Technologies - **Dyna - Q Learning Algorithm**: Combine Q - learning and model learning to improve strategies through actual and simulated experiences. - **Neural Network**: Use a deep neural network (DNN) to approximate the action - value function (Q - function) and minimize the loss function through the gradient descent method. - **Exploration - Exploitation**: Adopt the ε - greedy strategy, conduct a large amount of exploration in the initial stage, and gradually increase the proportion of using the learned knowledge in the later stage. ### Experimental Setup - **Environment**: A two - dimensional room containing agents, exits, walls, and obstacles. - **Reward Mechanism**: Give a negative reward (- 0.1) at each step, and give a zero reward (0) when reaching the exit. Ensure the importance of future rewards through the discount factor (γ = 0.999). - **Training Process**: Through a large number of training episodes, gradually optimize the behavior strategies of agents and finally achieve efficient evacuation. ### Conclusion The paper verifies the effectiveness and robustness of the proposed method in simple environments (without obstacles) and complex environments (with obstacles and multiple exits) through experiments. The results show that this method can not only generate results similar to those of the Social Force Model, but also avoid traps and achieve complete evacuation in complex environments. In addition, through single - agent training and transferring strategies to other agents, this method also performs well in multi - agent systems.