Abstract:Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handling the motion space. In this work, we explore the attention mechanism of pre-trained motion diffusion models. We uncover the roles and interactions of attention elements in capturing and representing intricate human motion patterns, and carefully integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer. Editing features associated with selected motions allows us to confront a challenge observed in prior motion diffusion approaches, which use general directives (e.g., text, music) for editing, ultimately failing to convey subtle nuances effectively. Our work is inspired by how a monkey closely imitates what it sees while maintaining its unique motion patterns; hence we call it Monkey See, Monkey Do, and dub it MoMo. Employing our technique enables accomplishing tasks such as synthesizing out-of-distribution motions, style transfer, and spatial editing. Furthermore, diffusion inversion is seldom employed for motions; as a result, editing efforts focus on generated motions, limiting the editability of real ones. MoMo harnesses motion inversion, extending its application to both real and generated motions. Experimental results show the advantage of our approach over the current art. In particular, unlike methods tailored for specific applications through training, our approach is applied at inference time, requiring no training. Our webpage is at <a class="link-external link-https" href="https://monkeyseedocg.github.io" rel="external noopener nofollow">this https URL</a>.

Unsupervised decomposition of natural monkey behavior into a sequence of motion motifs

Characterizing the structure of mouse behavior using Motion Sequencing

A Real-time Multi-Subject Three Dimensional Pose Tracking System for Analyzing Social Behaviors of Non-human Primates

High-throughput unsupervised quantification of patterns in the natural behavior of marmosets

Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics

Deep learning-based activity recognition and fine motor identification using 2D skeletons of cynomolgus monkeys

MacAction: Realistic 3D macaque body animation based on multi-camera markerless motion capture

Hierarchical Motion Understanding via Motion Programs

Development of a 3D tracking system for multiple marmosets under free-moving conditions

MonkeyPosekit: Automated Markerless 2D Pose Estimation of Monkey

Unsupervised identification of rat behavioral motifs across timescales

MarmoPose: A Deep Learning-Based System for Real-time Multi-Marmoset 3D Pose Tracking

AlphaChimp: Tracking and Behavior Recognition of Chimpanzees

Automated Tracking of Primate Behavior

Fine decomposition of rodent behavior via unsupervised segmentation and clustering of inertial signals

MacaquePose: A Novel “In the Wild” Macaque Monkey Pose Dataset for Markerless Motion Capture

MonkeyTrail: A Scalable Video-Based Method for Tracking Macaque Movement Trajectory in Daily Living Cages.

Three-dimensional Surface Motion Capture of Multiple Freely Moving Pigs Using MAMMAL.

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

Development of a Marmoset Apparatus for Automated Pulling to study cooperative behaviors

Development of a Marmoset Apparatus for Automated Pulling (MarmoAAP) to Study Cooperative Behaviors