Efficient ResNets: Residual Network Design

Aditya Thakur,Harish Chauhan,Nikunj Gupta
2023-06-21
Abstract:ResNets (or Residual Networks) are one of the most commonly used models for image classification tasks. In this project, we design and train a modified ResNet model for CIFAR-10 image classification. In particular, we aimed at maximizing the test accuracy on the CIFAR-10 benchmark while keeping the size of our ResNet model under the specified fixed budget of 5 million trainable parameters. Model size, typically measured as the number of trainable parameters, is important when models need to be stored on devices with limited storage capacity (e.g. IoT/edge devices). In this article, we present our residual network design which has less than 5 million parameters. We show that our ResNet achieves a test accuracy of 96.04% on CIFAR-10 which is much higher than ResNet18 (which has greater than 11 million trainable parameters) when equipped with a number of training strategies and suitable ResNet hyperparameters. Models and code are available at <a class="link-external link-https" href="https://github.com/Nikunj-Gupta/Efficient_ResNets" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The main objective of this paper is to design an improved ResNet model while keeping the number of model parameters under 5 million, and to achieve higher test accuracy on the CIFAR-10 dataset. Specifically, the authors aim to meet the parameter count constraint while adjusting various hyperparameters and training strategies to enable the model to achieve high accuracy on the CIFAR-10 benchmark. Compared to the traditional ResNet18 (which has over 11 million parameters), their model achieved a significant performance improvement, with a final test accuracy of 96.04%, far exceeding the unoptimized ResNet18 model (approximately 90%). Additionally, the paper provides a detailed description of the various methods and techniques they employed, including convolution kernel size, channel configuration, residual block design, batch normalization, Dropout, gradient clipping, and the Lookahead optimizer. These techniques collectively contributed to enhancing the model's performance while reducing the number of parameters.