Digital Human Interactive Recommendation Decision-Making Based on Reinforcement Learning

Xiong Junwu,Xiaoyun Feng,YunZhou Shi,James Zhang,Zhongzhou Zhao,Wei Zhou
DOI: https://doi.org/10.48550/arXiv.2210.10638
2022-11-04
Abstract:Digital human recommendation system has been developed to help customers find their favorite products and is playing an active role in various recommendation contexts. How to timely catch and learn the dynamics of the preferences of the customers, while meeting their exact requirements, becomes crucial in the digital human recommendation domain. We design a novel practical digital human interactive recommendation agent framework based on Reinforcement Learning(RL) to improve the efficiency of the interactive recommendation decision-making by leveraging both the digital human features and the superior flexibility of RL. Our proposed framework learns through real-time interactions between the digital human and customers dynamically through the state-of-art RL algorithms, combined with multimodal embedding and graph embedding, to improve the accuracy of personalization and thus enable the digital human agent to timely catch the attention of the customer. Experiments on real business data demonstrate that our framework can provide better personalized customer engagement and better customer experiences.
Information Retrieval,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: How to timely capture and learn customers' dynamic preferences in the digital human recommendation system while meeting their exact needs. Specifically, the paper focuses on improving the efficiency of interactive recommendation decision - making and the accuracy of personalization, especially in real - time interactive scenarios such as product recommendations in virtual live - streaming rooms. ### Problem Background Traditional recommendation systems usually adopt a two - stage model: First, prepare a batch of personalized candidate items, and then rank these candidate items and push them to customers. This model is suitable for classic passive display - type recommendation scenarios, but it has limitations. For example, customers can only passively consume the prepared items, and usually there is only one interaction opportunity (such as clicking or viewing). With the increase in online shopping activities, especially during the epidemic when people reduced offline social interactions, location - based services and real - time recommendations have become more important. This requires the system to be able to understand customers' preferences in a timely manner and meet their needs in dynamic interactions. ### Paper Solution To solve the above problems, the author proposes a new practical framework based on Reinforcement Learning (RL) - **MAgent** ("M" represents digital human), for digital human interactive recommendation agents. This framework improves the performance of the recommendation system in the following ways: 1. **Real - time Interactive Learning**: Utilize the state - of - the - art RL algorithms, combined with multi - modal embeddings (text, image, video, audio) and graph embeddings (knowledge graph and social graph), to learn customers' dynamic preferences through real - time interactions with customers. 2. **Improvement of Personalization Precision**: Through the flexibility and adaptability of RL algorithms, improve the accuracy of personalized recommendations, enabling digital humans to attract customers' attention in a timely manner during multi - round interactions. 3. **Multi - modal and Graph Embeddings**: Incorporate multi - modal and graph embedding techniques to further enhance the system's cognitive ability, enabling digital humans to better understand and respond to customers' needs. ### Experimental Verification The paper verifies the effectiveness of the proposed MAgent framework through experiments. The experimental results show that on the real - business data of virtual live - streaming rooms, compared with the traditional passive display - type recommendation system, MAgent significantly improves the effect of recommendation decision - making. For example, in the decision - making task of product content types, MAgent performs better than the baseline model DFM (Deep Factorization Model) in metrics such as Mean Reciprocal Rank (MRR) and Hits@K. ### Summary This paper solves the limitations of traditional recommendation systems in dynamic interaction scenarios by introducing a framework based on Reinforcement Learning, and improves the personalization accuracy and real - time response ability of the recommendation system. Future work will further expand to more complex combinatorial decision - space and explore more application scenarios to improve user experience and transaction conversion rate.