Unsupervised Representation Learning of Player Behavioral Data with Confidence Guided Masking
Jiashu Pu,Jianshi Lin,Xiaoxi Mao,Jianrong Tao,Xudong Shen,Yue Shang,Runze Wu
DOI: https://doi.org/10.1145/3485447.3512275
2022-01-01
Abstract:Players of online games generate rich behavioral data during gaming. Based on these data, game developers can build a range of data science applications, such as bot detection and social recommendation, to improve the gaming experience. However, the development of such applications requires data cleansing, training sample labeling, feature engineering, and model development, which makes the use of such applications in small and medium-sized game studios still uncommon. While acquiring supervised learning data is costly, unlabeled behavioral logs are often continuously and automatically generated in games. Thus we resort to unsupervised representation learning of player behavioral data to optimize intelligent services in games. Behavioral data has many unique properties, including semantic complexity, excessive length, etc. A worth noting property within raw player behavioral data is that a lot of it is task-irrelevant. For these data characteristics, we introduce a BPE-enhanced compression method and propose a novel adaptive masking strategy called Masking by Token Confidence (MTC) for the Masked Language Modeling (MLM) pre-training task. MTC is designed to increase the masking probabilities of task-relevant tokens. Experiments on four downstream tasks and successful deployment in a world-renowned Massively Multiplayer Online Role-Playing Game (MMORPG) prove the effectiveness of the MTC strategy1.