Knowledge distillation of convolutional neural network models on inaccurately labeled data for automatic text CAPTCHA recognition on mobile devices

V.I. Terekhov,D.O. Ishkov
DOI: https://doi.org/10.18127/j19997493-202103-01
2021-01-01
Abstract:Problem definition: most of the existing works investigate the recognition of a fixed-length CAPTCHA, but the authors suggest using knowledge distillation to simulate the operation of recurrent-convolutional models, which have proven themselves well in the task of predicting the dynamic length of characters in images. The rapid development of deep learning systems, the recognition quality of which has reached the level of human vision, makes the method of protection using CAPTCHA increasingly ineffective. In addition, such protection imposes high requirements on the characteristics of the devices on which recognition is performed. The research carried out in this work allowed us to propose an effective method of training CNN on inaccurate data for automatic circumvention of text CAPTCHAS on mobile devices. Purpose: acquiring a lightweight and high-quality model for text CAPTCHA recognition that can work on mobile devices. Results: the paper describes a method for training a lightweight model on inaccurate markup obtained from another model. The influence of the size of the training sample on the quality of recognition, the speed of the model on various end devices is studied on the example of a popular social network. Practical significance: The proposed method allows you to train convolutional models to bypass the protection of websites-text CAPTCHA, which are undemanding to the characteristics of devices. The analysis of the model errors allows us to make recommendations for improving ways to counteract automatic recognition.
What problem does this paper attempt to address?