Automated Play-Testing Through RL Based Human-Like Play-Styles Generation

Pierre Le Pelletier de Woillemont,Rémi Labory,Vincent Corruble
DOI: https://doi.org/10.48550/arXiv.2211.17188
IF: 5.414
2022-11-29
Machine Learning
Abstract:The increasing complexity of gameplay mechanisms in modern video games is leading to the emergence of a wider range of ways to play games. The variety of possible play-styles needs to be anticipated by designers, through automated tests. Reinforcement Learning is a promising answer to the need of automating video game testing. To that effect one needs to train an agent to play the game, while ensuring this agent will generate the same play-styles as the players in order to give meaningful feedback to the designers. We present CARMI: a Configurable Agent with Relative Metrics as Input. An agent able to emulate the players play-styles, even on previously unseen levels. Unlike current methods it does not rely on having full trajectories, but only summary data. Moreover it only requires little human data, thus compatible with the constraints of modern video game production. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the wide variety of game styles brought about by the increasingly complex gameplay mechanisms in modern video games. This requires designers to anticipate these diverse game styles in advance during the game development process and ensure the difficulty and balance of the game through automated testing. Specifically, this paper aims to train an agent through Reinforcement Learning (RL) so that it can generate game styles similar to those of human players, thereby providing meaningful feedback to designers. Key issues include: 1. **How to conduct effective automated testing with limited human data**: Since it is very difficult to obtain complete human trajectory data in the early stages of game development, a method is required to train agents with only a small amount of aggregated data. 2. **How to ensure that the game styles generated by the agent are consistent with those of human players**: In order for the results of automated testing to be useful to designers, the agent needs to be able to imitate the game styles of different players, rather than just pursuing high scores or victory. 3. **How to generalize human game styles on unseen levels**: The agent needs to have sufficient generalization ability so that it can also generate reasonable human game styles on new levels. To solve these problems, this paper proposes CARMI (Configurable Agent with Relative Metrics as Input), which is a configurable agent based on relative metric input. The main contribution of CARMI is that it can learn to generate a series of game styles that conform to the player distribution through a single RL training loop, without the need for complete trajectory data, relying only on a small amount of key aggregated data. This enables CARMI to also exhibit diverse and human - like game styles on new levels, thereby helping designers to more accurately assess the difficulty and balance of the game. ### Formula Representation - **Summary function** \(\psi\) maps a complete trajectory \(\tau=(s_0, a_1, s_1,..., a_T, s_T)\) to a set of aggregated data: \[ \psi(\tau)\in\mathbb{R}^M \] - **Normalized aggregated data** \(\psi_l(\tau)\) represents the normalized aggregated data according to the player distribution on the \(l\)-th level: \[ \psi_l(\tau):=\frac{\psi(\tau)-\mu_l}{\sigma_l} \] where \(\mu_l\) and \(\sigma_l\) are the mean and standard deviation of the player - aggregated data on the \(l\)-th level, respectively. - **Reward function** \(r_t\) is defined as: \[ r_t = d_{t - 1}-d_t+[-d_t]_{t = T} \] where \(d_t=\Delta(\psi_l(\tau_{\pi_z, t}); z)\) represents the distance between the current agent and the target \(z\). In this way, CARMI can gradually learn to imitate the game styles of human players during the training process and maintain the consistency of this style on new levels.