Abstract:Machine-learning technology powers many aspects of modern society: from web searches to content filtering on social networks to recommendations on e-commerce websites, and it is increasingly present in consumer products such as cameras and smartphones. Machine-learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products with users’ interests, and select relevant results of search. Increasingly, these applications make use of a class of techniques called deep learning. Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. For decades, constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise to design a feature extractor that transformed the raw data (such as the pixel values of an image) into a suitable internal representation or feature vector from which the learning subsystem, often a classifier, could detect or classify patterns in the input. Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image. The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions. The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects, and subsequent layers would detect objects as combinations of these parts. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure. Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years. It has turned out to be very good at discovering intricate structures in high-dimensional data and is therefore applicable to many domains of science, business and government. In addition to beating records in image recognition and speech recognition, it has beaten other machine-learning techniques at predicting the activity of potential drug molecules, analysing particle accelerator data, reconstructing brain circuits, and predicting the effects of mutations in non-coding DNA on gene expression and disease. Perhaps more surprisingly, deep learning has produced extremely promising results for various tasks in natural language understanding, particularly topic classification, sentiment analysis, question answering and language translation. We think that deep learning will have many more successes in the near future because it requires very little engineering by hand, so it can easily take advantage of increases in the amount of available computation and data. New learning algorithms and architectures that are currently being developed for deep neural networks will only accelerate this progress.

From Photographic Image to Computer Vision

ImageNet classification with deep convolutional neural networks

Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation

Computer Vision : History , the Rise of Deep Networks , and Future Vistas Panel on Perception and Cognition , MORS Meeting on Artificial Intelligence and Autonomy

Research on image classification leveraging deep convolutional neural networks and visual cognition

Editorial: Learning With Fewer Labels in Computer Vision

Automated Classification of Model Errors on ImageNet

ImageNet Large Scale Visual Recognition Challenge

Partial success in closing the gap between human and machine vision

Deep Learning Human Mind for Automated Visual Classification

ImageNet: A large-scale hierarchical image database

Progress and limitations of deep networks to recognize objects in unusual poses

Are we done with ImageNet?

Visual Veracity: Advancing AI-Generated Image Detection with Convolutional Neural Networks

Computer Vision and Deep Learning Transforming Image Recognition and Beyond

Ten Years after ImageNet: A 360° Perspective on AI

Enhancing Image Classification Accuracy Based on AlexNet

Deep Learning

Flaws of ImageNet, Computer Vision's Favourite Dataset

Ten years after ImageNet: a 360° perspective on artificial intelligence