Bayesian mixture modeling for multivariate conditional distributions

Maria DeYoreo,Jerome P. Reiter

DOI: https://doi.org/10.48550/arXiv.1606.04457

2016-07-14

Abstract:We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The model uses multivariate normal and categorical mixture kernels for the random variables. It induces dependence between the random and fixed variables through the means of the multivariate normal mixture kernels and via a truncated local Dirichlet process. The latter encourages observations with similar values of the fixed variables to share mixture components. Using a simulation of data fusion, we illustrate that the model can estimate underlying relationships in the data and the distributions of the missing values more accurately than a mixture model applied to the random and fixed variables jointly. We use the model to analyze consumers' reading behaviors using a quota sample, i.e., a sample where the empirical distribution of some variables is fixed by design and so should not be modeled as random, conducted by the book publisher HarperCollins.

Methodology

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to estimate the joint distribution of mixed data (including ordinal, nominal and continuous variables) given a set of fixed variables. Specifically, the paper proposes a Bayesian mixture model for estimating the conditional distribution of other variables (called random variables) given certain variables (called fixed variables). This model is particularly suitable for handling data collected in stratified or quota sampling designs, where the empirical distributions of certain variables are pre - fixed and thus should not be modeled as random variables. The main contributions of the paper are as follows: 1. **Model flexibility**: A Bayesian mixture model that can flexibly handle different types of data (ordinal, nominal and continuous) is proposed. 2. **Conditional dependence modeling**: Through the multivariate normal mixture kernel and the truncated local Dirichlet process, the model can capture the dependence relationships between random variables and fixed variables. 3. **Data fusion and missing value handling**: Through simulation experiments, the superior performance of this model in data fusion and missing value imputation is demonstrated, especially when dealing with complex data structures. The paper further verifies the effectiveness of the model through a practical case - analyzing the reader behavior data of HarperCollins Publishers. In this case, the researchers attempt to understand individual reading behaviors and interests, such as distinguishing the characteristics of people who own e - books and those who do not. In conclusion, this paper aims to provide an effective statistical tool for more accurately estimating and analyzing the joint distribution of mixed data in the presence of fixed variables.

Bayesian mixture modeling for multivariate conditional distributions

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence

Bayesian estimation and prediction for certain mixtures

Bayesian mixtures of common factor analyzers: Model, variational inference, and applications

Bayesian Mixture Models With Focused Clustering for Mixed Ordinal and Nominal Data

Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions

A mixture distribution for modelling bivariate ordinal data

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions

Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data

Bayesian finite mixtures of Ising models

Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

The Bayesian Low-Rank Determinantal Point Process Mixture Model

Mixture model fitting using conditional models and modal Gibbs sampling

Flexible Bayesian Product Mixture Models for Vector Autoregressions

Nonparametric Bayesian Methods (Dirichlet Process Mixtures)

Bayesian Modal Regression based on Mixture Distributions

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Rank-based Bayesian clustering via covariate-informed Mallows mixtures

Mixture Models With a Prior on the Number of Components

Bayesian Clustering for Ordinal Data Based on Finite Mixture Models of Latent Variables

Development, characterization, and subcellular location of DNAse activity in HL-60 cells and monocytes.