Joint architecture and knowledge distillation in CNN for Chinese text recognition

Zi-Rui Wang,Jun Du
DOI: https://doi.org/10.1016/j.patcog.2020.107722
IF: 8
2021-03-01
Pattern Recognition
Abstract:<p>The distillation technique helps transform cumbersome neural networks into compact networks so that models can be deployed on alternative hardware devices. The main advantage of distillation-based approaches include a simple training process, supported by most off-the-shelf deep learning software and no special hardware requirements. In this paper, we propose a guideline for distilling the architecture and knowledge of pretrained standard CNNs. The proposed algorithm is first verified on a large-scale task: offline handwritten Chinese text recognition (HCTR). Compared with the CNN in the state-of-the-art system, the reconstructed compact CNN can reduce the computational cost by <span class="math"><math>&gt;10×</math></span>and the model size by <span class="math"><math>&gt;8×</math></span>with negligible accuracy loss. Then, by conducting experiments on two additional classification task datasets: <em>Chinese Text in the Wild</em> (CTW) and MNIST, we demonstrate that the proposed approach can also be successfully applied on mainstream backbone networks.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?