Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games

Alessandro Sestini,Alexander Kuhnle,Andrew D. Bagdanov
DOI: https://doi.org/10.48550/arXiv.2012.03532
2020-12-07
Abstract:Recent advances in Deep Reinforcement Learning (DRL) have largely focused on improving the performance of agents with the aim of replacing humans in known and well-defined environments. The use of these techniques as a game design tool for video game production, where the aim is instead to create Non-Player Character (NPC) behaviors, has received relatively little attention until recently. Turn-based strategy games like Roguelikes, for example, present unique challenges to DRL. In particular, the categorical nature of their complex game state, composed of many entities with different attributes, requires agents able to learn how to compare and prioritize these entities. Moreover, this complexity often leads to agents that overfit to states seen during training and that are unable to generalize in the face of design changes made during development. In this paper we propose two network architectures which, when combined with a \emph{procedural loot generation} system, are able to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions. The first is based on a dense embedding of the categorical input space that abstracts the discrete observation model and renders trained agents more able to generalize. The second proposed architecture is more general and is based on a Transformer network able to reason relationally about input and input attributes. Our experimental evaluation demonstrates that new agents have better adaptation capacity with respect to a baseline architecture, making this framework more robust to dynamic gameplay changes during development. Based on the results shown in this paper, we believe that these solutions represent a step forward towards making DRL more accessible to the gaming industry.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to make the deep reinforcement learning (DRL) algorithm better adapt to the constantly changing game design parameters during the video game development process, especially in Roguelike games. Specifically, the authors focus on the behavior adaptability and scalability issues of non - player characters (NPCs). ### Problem Background In Roguelike games, the complexity of NPC behavior is mainly reflected in the following aspects: 1. **Complex discrete state space**: Entities in the game (such as weapons, items, etc.) have different attributes, and these attributes are usually discrete. This discrete nature makes the DRL algorithm prone to over - fitting to the states seen during training and difficult to generalize to unseen states. 2. **Changes in design parameters**: During the game development process, developers often need to adjust the parameters in the game (for example, changing the distribution of dropped items to balance the game). Such changes usually cause the trained agents to become invalid and need to be retrained. 3. **Limitations of the fixed - ID representation method**: In the original DeepCrawl framework, each item is represented by a unique integer ID, which limits the flexibility and generalization ability of the system. ### Main Contributions of the Paper To solve the above problems, the authors propose two new network architectures and combine a procedural loot generation system to improve the adaptability and scalability of NPC behavior. #### 1. Dense Embedding Policy Network - **Core idea**: Through multi - channel map input, embed the entity type and its attributes at each position into a continuous vector representation. This can better handle the discrete input space, and there is no need to re - define the entire network structure when adding or modifying attributes. - **Implementation method**: Use dense embedding layers to combine multiple categorical values together and map them to a fixed - size continuous representation. Specifically, each channel contains a one - hot encoding of a category, and then it is transformed through a 1x1 convolutional layer and a tanh activation function. #### 2. Transformer - based Policy Network - **Core idea**: Utilize the self - attention mechanism in the Transformer architecture to explicitly capture the relationships between entities in the scene. This method can more effectively handle the interactions between entities and improve the generalization ability. - **Implementation method**: Represent each object as an array of its attribute values and process it through a fully - connected layer. Then apply the Transformer layer to infer the relationships between items, and finally generate a fixed - size embedding vector for each entity. These embedding vectors will be scattered in the spatial map for subsequent convolutional layer processing. ### Experimental Results The experimental results show that these two new architectures have significant improvements in performance, adaptability, and scalability: - **Performance**: Both architectures can achieve a reward level comparable to the original architecture in a new environment, and even perform better in some cases. - **Adaptability**: When the environment changes from a fully procedurally generated loot system to a fixed loot distribution, the new architectures show better generalization ability, especially when facing an unbalanced loot distribution. - **Scalability**: The new architectures can easily handle the situation of adding or modifying attributes without the need to re - define the network structure or retrain the agents. ### Conclusion By introducing the procedurally generated loot system and two new network architectures, the authors have successfully improved the adaptability and scalability of NPC behavior in the dynamic game development process, making DRL technology more suitable for the video game industry. --- The above summarizes the core problems and solutions in the paper. If you have more specific questions or need further explanation, please feel free to let us know!