Abstract:We give an algorithm for learning a mixture of unstructured distributions. This problem arises in various unsupervised learning scenarios, for example in learning topic models from a corpus of documents spanning several topics. We show how to learn the constituents of a mixture of k arbitrary distributions over a large discrete domain [ n ]={1, 2, ..., n } and the mixture weights, using O ( n polylog n ) samples. (In the topic-model learning setting, the mixture constituents correspond to the topic distributions.) This task is information-theoretically impossible for k > 1 under the usual sampling process from a mixture distribution. However, there are situations (such as the above-mentioned topic model case) in which each sample point consists of several observations from the same mixture constituent. This number of observations, which we call the "sampling aperture", is a crucial parameter of the problem. We obtain the first bounds for this mixture-learning problem without imposing any assumptions on the mixture constituents. We show that efficient learning is possible exactly at the information-theoretically least-possible aperture of 2 k -1. Thus, we achieve near-optimal dependence on n and optimal aperture. While the sample-size required by our algorithm depends exponentially on k , we prove that such a dependence is unavoidable when one considers general mixtures. A sequence of tools contribute to the algorithm, such as concentration results for random matrices, dimension reduction, moment estimations, and sensitivity analysis.

Self-organizing mixture networks for probability density estimation.

Learning nonlinear manifolds based on mixtures of localized linear manifolds under a self-organizing framework

SMLSOM: The shrinking maximum likelihood self-organizing map

Self-Organizing Mixture Networks for Representation of Grayscale Digital Images

Efficient learning of standard finite normal mixtures for image quantification

Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network.

A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models

A Hierarchical Mixture Density Network

Application of Supervised SOM Neural Network in Intrusion Detection

AMSOM: Adaptive Moving Self-organizing Map for Clustering and Visualization

Non-Euclidean Self-Organizing Maps

Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

Unsupervised Learning with Self-Organizing Spiking Neural Networks

MD-NOMAD: Mixture density nonlinear manifold decoder for emulating stochastic differential equations and uncertainty propagation

Self-organizing Map Algorithm Based on Intra-class Minimum Similarity Degree and Application in Reservoir Prediction

Clustering High-Dimensional Data Using Growing Som

Predictive Uncertainty Quantification with Compound Density Networks

Nearest Neighbor Dirichlet Mixtures

Learning Mixtures of Arbitrary Distributions over Large Discrete Domains.

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Randomized Mixture Models for Probability Density Approximation and Estimation