Heterogeneous Acceleration Pipeline for Recommendation System Training

Muhammad Adnan,Yassaman Ebrahimzadeh Maboud,Divya Mahajan,Prashant J. Nair
2024-04-28
Abstract:Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns.
Hardware Architecture,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?