Abstract:As a step toward understanding the complex information from data and relationships, structural and discriminative knowledge reveals insight that may prove useful in data interpretation and exploration. This paper reports the development of an automated and intelligent procedure for generating the hierarchy of minimax entropy models and principal component visualization spaces for improved data explanation. The proposed hierarchical minimax entropy modeling and probabilistic principal component projection are both statistically principled and visually effective at revealing all of the interesting aspects of the data set. The methods involve multiple use of standard finite normal mixture models and probabilistic principal component projections. The strategy is that the top-level model and projection should explain the entire data set, best revealing the presence of clusters and relationships, while lower-level models and projections should display internal structure within individual clusters, such as the presence of subclusters and attribute trends, which might not be apparent in the higher-level models and projections. With many complementary mixture models and visualization projections, each level will be relatively simple while the complete hierarchy maintains overall flexibility yet still conveys considerable structural information. In particular, a model identification procedure is developed to select the optimal number and kernel shapes of local clusters from a class of data, resulting in a standard finite normal mixtures with minimum conditional bias and variance, and a probabilistic principal component neural network is advanced to generate optimal projections, leading to a hierarchical visualization algorithm allowing the complete data set to be analyzed at the top level, with best separated subclusters of data points analyzed at deeper levels. Hierarchical probabilistic principal component visualization involves (1) evaluation of posterior probabilities for mixture data set, (2) estimation of multiple principal component axes from probabilistic data set, and (3) generation of a complete hierarchy of visual projections. With a soft clustering of the data set ti via the EM algorithm, data points will effectively belong to more than one cluster at any given level with posterior probabilities denoted by z(ik). Thus, the effective input values are z(ik)t(i) for an independent visualization space k in the hierarchy. Further projections can again be performed using the effective input values z(ik)z(j\k)t(i) for the visualization subspace j. The complete visual explanation hierarchy is generated by performing principal projection (dimensionality reduction) and model identification (structure decomposition) in two iterative steps using information theoretic criteria, EM algorithm, and probabilistic principal component analysis.

Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

Nonlinear Multidimensional Data Projection And Visualisation

Dimensionality Reduction and Data Visualisation

Dimension Projection Matrix/Tree: Interactive Subspace Visual Exploration and Analysis of High Dimensional Data

Deep Learning Multidimensional Projections

On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets

Exploring visual quality of multidimensional time series projections

Joint Characterization of Multiscale Information in High Dimensional Data

Visualizing Large-Scale and High-Dimensional Data

Implicit Multidimensional Projection of Local Subspaces

Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections

Spectral multidimensional scaling

Hierarchical Minimax Entropy Modeling and Probabilistic Principal Component Visualization for Data Exploration

Scalable Multivariate Volume Visualization and Analysis based on Dimension Projection and Parallel Coordinates.

The Subspace Voyager: Exploring High-Dimensional Data along a Continuum of Salient 3D Subspaces

High Dimensional Data Visualization Analysis Based on Unsupervised Laplacian Score

Subspace Data Visualization With Dissimilarity Based On Principal Angle

Improving multidimensional projection quality with user-specific metrics and optimal scaling

Visual and semantic interpretability of projections of high dimensional data for classification tasks

Exploring high-dimensional data through locally enhanced projections.