PrecisionProbe: Non-intrusive Performance Analysis Tool for Deep Learning Recommendation Models

Weiyu Peng,Jinghao Wang,Tianyu Wo,Renyu Yang
DOI: https://doi.org/10.1109/jcc62314.2024.00010
2024-01-01
Abstract:Deep learning recommendation models (DLRM) exploit user behaviors such as clicks, browse footprints, preferences, etc. for improved personalized experiences. However, in the face of the exponential growth of user data, such models require increasing GPU resources that are unaffordable and insufficient in a computing cluster. To improve GPU utilization and facilitate the advances of GPU scheduling algorithms, we present PrecisionProbe, a non-intrusive monitoring and analysis tool that can run upon Kubernetes and conduct sophisticated analytics of GPU resource utilization without altering the existing training code. PrecisionProbe captures fine-grained GPU metrics at the level of individual model layers and allows for a precise understanding of resource consumption patterns by exploring such detailed metrics. The mechanism is crucial for devising effective GPU scheduling algorithms, particularly tailored for DLRM training jobs dependent upon consumption patterns. Experimental results show that the recommendation models, as opposed to CV and NLP models, utilize less FP32 processing but have higher memory interaction frequencies. These findings indicate the unique resource needs of recommendation systems and necessitate the need of performance analytic using PrecisionProbe.
What problem does this paper attempt to address?