Abstract:Split computing is a promising approach to reduce the inference latency of deep neural network (DNN) models. In this paper, we propose a two-phase split computing framework (TSCF). In TSCF, for vertical inter-layer splitting between the computing nodes at different levels (e.g., central and edge clouds), a shortest path problem in a directed graph is formulated and a pruning-based low-complexity solution is devised. In addition, for horizontal intra-layer splitting between the computing nodes at the same level (e.g., edge clouds), the execution units of a specific layer are further divided and distributed to the computing nodes at the same level proportionally to their available resources. The evaluation results demonstrate that TSCF can reduce inference latency more than 38.8% compared to the traditional inter-layer splitting scheme by efficiently using the resources of distributed computing nodes. In addition, it is demonstrated that near-optimal performance in terms of inference latency can be achieved even with a pruning-based low-complexity solution.

Two-Phase Split Computing Framework in Edge-Cloud Continuum