RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems

Shuo Su,Xiaoshuang Chen,Yao Wang,Yulin Wu,Ziqiang Zhang,Kaiqiao Zhan,Ben Wang,Kun Gai
2024-09-20
Abstract:Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
Machine Learning,Information Retrieval
What problem does this paper attempt to address?
This paper attempts to address the issue of cache allocation in large-scale recommendation systems, particularly how to maximize overall user engagement under limited computational resources. Specifically, the paper focuses on two key challenges: 1. **Value-Strategy Dependency**: Existing computational resource allocation methods assume that requests in different time periods are independent and that the value of computational resources is independent of the allocation strategy. However, these assumptions do not hold in the cache allocation problem. On one hand, the size of the result cache is limited, and if the system continuously recommends cached results to the same user, the cache will quickly be exhausted, and user experience will rapidly decline. On the other hand, the system's choice of whether to use the cache not only affects the user feedback of the current request but also influences the user's future behavior. Therefore, the value of the current cache choice also depends on future cache allocation strategies. 2. **Streaming Allocation**: Existing computational resource allocation methods typically allocate a batch of requests within each time period. However, requests in online recommendation systems arrive in a streaming manner, and the system needs to determine cache choices for each individual request as it arrives while satisfying global computational budget constraints. To address these challenges, the paper proposes a Reinforcement Prediction-Allocation Framework (RPAF). RPAF is a two-stage approach that includes a prediction stage and an allocation stage: - **Prediction Stage**: Uses reinforcement learning to estimate the value of different cache choices, considering value-strategy dependency. - **Allocation Stage**: Uses the estimated values for streaming allocation while satisfying global budget constraints. To handle the global and strict nature of budget constraints, the paper introduces a Relaxed Local Allocator (RLA), which transforms the constrained reinforcement learning problem into a computationally feasible form. Additionally, the paper proposes a PoolRank algorithm to handle the streaming allocation problem, ensuring that budget constraints are strictly met at each time step. Experimental validation shows that RPAF significantly improves user engagement under computational budget constraints.