Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving
Xin You,Hailong Yang,Siqi Wang,Tao Peng,Chen Ding,Xinyuan Li,Bangduo Chen,Zhongzhi Luan,Tongxuan Liu,Yong Li,Depei Qian
DOI: https://doi.org/10.1109/tc.2024.3449749
IF: 3.183
2024-10-12
IEEE Transactions on Computers
Abstract:Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose RecServe, a high-performant serving system for recommendation with the optimized design of structured features and SessionGroups for recommendation serving. With structured features, RecServe packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With session group, RecServe further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that RecServe can achieve maximum performance speedups of 12.3 × and 22.0× compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.
engineering, electrical & electronic,computer science, hardware & architecture