Pruning‐guided feature distillation for an efficient transformer‐based pose estimation model

Dong‐hwi Kim,Dong‐hun Lee,Aro Kim,Jinwoo Jeong,Jong Taek Lee,Sungjei Kim,Sang‐hyo Park
DOI: https://doi.org/10.1049/cvi2.12277
IF: 1.484
2024-04-02
IET Computer Vision
Abstract:The authors propose a transformer‐based feature distillation (TFD) method that exploits the characteristics of transformer‐based architecture to obtain a significantly efficient pose estimation model in view of model size and accuracy. To the best of the authors' knowledge, pruning‐guided TFD is the first approach proposed for 3D human pose estimation that employs transformer architecture. The proposed approach was tested on various large data sets and the results show that it can reduce the model size by 30% compared to the state‐of‐the‐art while ensuring high accuracy. The authors propose a compression strategy for a 3D human pose estimation model based on a transformer which yields high accuracy but increases the model size. This approach involves a pruning‐guided determination of the search range to achieve lightweight pose estimation under limited training time and to identify the optimal model size. In addition, the authors propose a transformer‐based feature distillation (TFD) method, which efficiently exploits the pose estimation model in terms of both model size and accuracy by leveraging transformer architecture characteristics. Pruning‐guided TFD is the first approach for 3D human pose estimation that employs transformer architecture. The proposed approach was tested on various extensive data sets, and the results show that it can reduce the model size by 30% compared to the state‐of‐the‐art while ensuring high accuracy.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?