MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

Hongliang Luo,Wei Xi,Daniel Tang
DOI: https://doi.org/10.3390/s24227354
IF: 3.9
2024-11-27
Sensors
Abstract:In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision–language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion–text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?