Abstract:We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.

Mixture Dense Regression for Object Detection and Human Pose Estimation

Background modeling using mixture of Gaussians and Laplacian pyramid decomposition.

Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions

GMDN: A Lightweight Graph-Based Mixture Density Network for 3D Human Pose Regression.

MPLBoost-based Mixture Model for Effective Human Detection with Deformable Part Model

Articulated Human Detection with Flexible Mixtures of Parts

A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images

Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network

Articulated pose estimation with flexible mixtures-of-parts

A Stochastic-Geometrical Framework for Object Pose Estimation Based on Mixture Models Avoiding the Correspondence Problem

Occlusion-Aware Human Pose Estimation with Mixtures of Sub-Trees

Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks

Multi-Channel Adaptive Mixture Background Model for Real-time Tracking.

Object detection in remote sensing imagery using a discriminatively trained mixture model

Probabilistic Rotation Modeling Based on Directional Mixture Density Networks

MMDA: Multi-person marginal distribution awareness for monocular 3D pose estimation

Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation

LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation