Abstract:Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature‐based classifiers and nonlinear predictive models. Training neural networks involves the optimization of nonconvex objective functions, and usually, the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counterintuitive, alternative is to randomly assign a subset of the networks’ weights so that the resulting optimization task can be formulated as a linear least‐squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favorable benefits, including (1) simplicity of implementation, (2) faster learning with less intervention from human beings, and (3) possibility of leveraging overall linear regression and classification algorithms (e.g., ℓ 1 norm minimization for obtaining sparse formulations). This class of neural networks attractive and valuable to the data mining community, particularly for handling large scale data mining in real‐time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims to provide a self‐contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems, and most importantly, foster the exchanges of well‐known results throughout different communities. WIREs Data Mining Knowl Discov 2017, 7:e1200. doi: 10.1002/widm.1200This article is categorized under: Technologies > Machine Learning

Neural Networks Learn Statistics of Increasing Complexity

Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks

A distributional simplicity bias in the learning dynamics of transformers

Statistical signatures of abstraction in deep neural networks

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Hierarchical Simplicity Bias of Neural Networks

Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory

Deep learning systems as complex networks

On Diversity in Discriminative Neural Networks

Generalizing similarity in noisy setups: the DIBS phenomenon

Neural Network Characterization and Entropy Regulated Data Balancing through Principal Component Analysis

Statistical Features in Learning

Randomness in Neural Networks: an Overview

Neural Scaling Laws Rooted in the Data Distribution

Complexity from Adaptive-Symmetries Breaking: Global Minima in the Statistical Mechanics of Deep Neural Networks

Distribution learning via neural differential equations: a nonparametric statistical perspective

Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

The twin peaks of learning neural networks

Discrete Distribution Networks

Learning low-rank latent mesoscale structures in networks