Segmentation of Multivariate Mixed Data Via Lossy Coding and Compression

Harm Derksen,Yi Ma,Wei Hong,John Wright
DOI: https://doi.org/10.1117/12.714912
2007-01-01
Abstract:In this paper, based on ideas from lossy data coding and compression, we present a simple but surprisingly effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions or linear subspaces. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. We show that deterministic segmentation minimizes an upper bound on the (asymptotically) optimal solution. The proposed algorithm does not require any prior knowledge of the number or dimension of the groups, nor does it involve any parameter estimation. Simulation results reveal intriguing phase-transition behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
What problem does this paper attempt to address?