POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences

Lixian Ma,Haoruo Chen,En Shao,Leping Wang,Quan Chen,Guangming Tan
DOI: https://doi.org/10.1145/3627535.3638485
2024-01-01
Abstract:Co-locating multiple DNN servings to share GPU resource is widely used to improve resource utilization while guaranteeing user QoS. Existing GPU sharing mechanism is restricted to model level, and fluctuations in kernel-level resource demands highlight a suboptimal utilization of the current sharing mechanism. We design a multi-DNN serving system, FineCo, that leverages a novel fine-grained resource sharing mechanism to optimize concurrent inference without modifications to the hardware or operating system. Our prototype implementation demonstrates that FineCo achieves up to 40% throughput improvement over the state-of-the-art work.
What problem does this paper attempt to address?