Convolutional Neural Networks Applied to House Numbers Digit Classification

Pierre Sermanet,Soumith Chintala,Yann LeCun
DOI: https://doi.org/10.48550/arXiv.1204.3968
2012-04-18
Abstract:We classify digits of real-world house numbers using convolutional neural networks (ConvNets). ConvNets are hierarchical feature learning neural networks whose structure is biologically inspired. Unlike many popular vision approaches that are hand-designed, ConvNets can automatically learn a unique set of features optimized for a given task. We augmented the traditional ConvNet architecture by learning multi-stage features and by using Lp pooling and establish a new state-of-the-art of 94.85% accuracy on the SVHN dataset (45.2% error improvement). Furthermore, we analyze the benefits of different pooling methods and multi-stage features in ConvNets. The source code and a tutorial are available at <a class="link-external link-http" href="http://eblearn.sf.net" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to classify and recognize the numbers in house numbers in complex natural scenes. Compared with the character recognition tasks in documents, this problem is more challenging, mainly due to the influence of factors such as low background contrast, low image resolution, out - of - focus or motion - blurred images, and large differences in lighting conditions. These problems result in the current best methods still lagging behind human performance in terms of performance. The paper mentions that in order to meet these challenges, researchers used Convolutional Neural Networks (ConvNets), a hierarchically feature - learning neural network inspired by biology. Unlike many visual methods that require manual design, ConvNets can automatically learn a unique set of features optimized for specific tasks. By introducing multi - stage feature learning and using Lp pooling techniques, the authors achieved a new state - of - the - art level in this task, reaching an accuracy rate of 94.85%, which is a 45.2% error rate improvement over the previous best method. In addition, the paper also analyzed the benefits of different pooling methods and multi - stage features in ConvNets.