Abstract:We present two new approaches for point prediction with streaming data. One is based on the Count-Min sketch (CMS) and the other is based on Gaussian process priors with a random bias. These methods are intended for the most general predictive problems where no true model can be usefully formulated for the data stream. In statistical contexts, this is often called the $\mathcal{M}$-open problem class. Under the assumption that the data consists of i.i.d samples from a fixed distribution function $F$, we show that the CMS-based estimates of the distribution function are consistent. We compare our new methods with two established predictors in terms of cumulative $L^1$ error. One is based on the Shtarkov solution (often called the normalized maximum likelihood) in the normal experts setting and the other is based on Dirichlet process priors. These comparisons are for two cases. The first is one-pass meaning that the updating of the predictors is done using the fact that the CMS is a sketch. For predictors that are not one-pass, we use streaming $K$-means to give a representative subset of fixed size that can be updated as data accumulate. Preliminary computational work suggests that the one-pass median version of the CMS method is rarely outperformed by the other methods for sufficiently complex data. We also find that predictors based on Gaussian process priors with random biases perform well. The Shtarkov predictors we use here did not perform as well probably because we were only using the simplest example. The other predictors seemed to perform well mainly when the data did not look like they came from an M-open data generator.

Streamed Learning: One-Pass SVMs

Streaming View Learning.

Learning with Feature Evolvable Streams.

STREAM: A Universal State-Space Model for Sparse Geometric Data

Streaming Classification with Emerging New Class by Class Matrix Sketching.

Streaming Label Learning for Modeling Labels on the Fly.

Efficient Unsupervised Dimension Reduction for Streaming Multiview Data.

Gradient Boosting on Stochastic Data Streams

Point Prediction for Streaming Data

High-Dimensional Geometric Streaming for Nearly Low Rank Data

A Framework of Online Learning with Imbalanced Streaming Data.

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Online updating mode learning for streaming datasets

Online learning of quadratic manifolds from streaming data for nonlinear dimensionality reduction and nonlinear model reduction

A scalable supervised algorithm for dimensionality reduction on streaming data

Streaming Kernel PCA Algorithm With Small Space

Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Online learning for streaming data classification in nonstationary environments

Streaming Active Learning with Deep Neural Networks

Class Imbalance Robust Incremental LPSVM for Data Streams Learning.