Abstract:Video generation technologies are developing rapidly and have broad potential applications. Among these technologies, camera control is crucial for generating professional-quality videos that accurately meet user expectations. However, existing camera control methods still suffer from several limitations, including control precision and the neglect of the control for subject motion dynamics. In this work, we propose I2VControl-Camera, a novel camera control method that significantly enhances controllability while providing adjustability over the strength of subject motion. To improve control precision, we employ point trajectory in the camera coordinate system instead of only extrinsic matrix information as our control signal. To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion, not merely the linear terms, and design an operator that effectively represents the motion strength. We use an adapter architecture that is independent of the base model structure. Experiments on static and dynamic scenes show that our framework outperformances previous methods both quantitatively and qualitatively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in video generation technology, there are some limitations in existing camera control methods, including insufficient control precision and neglect of the control of the main body motion dynamics. Specifically, the paper proposes a new camera control method - I2VControl - Camera, aiming to significantly improve controllability and provide the ability to adjust the intensity of the main body motion. The paper improves the control precision by using the point trajectories in the camera coordinate system as control signals instead of relying solely on external matrix information. In addition, in order to accurately control and adjust the intensity of the main body motion, the paper explicitly models the high - order components of video trajectory expansion, not just the linear terms, and designs an operator that can effectively represent the motion intensity. This method not only improves the precise control of camera motion but also allows users to adjust the intensity of the main body motion in the video, thereby generating professional - quality videos that are more in line with user expectations. ### Main Contributions 1. **Explicitly model decoupled motion representations**: 3D rigid point trajectories and motion intensities, which are respectively used for camera and main body motion control. 2. **Propose to construct a data pipeline for training control signals**: Register 3D tracking information and motion masks from RGB videos. 3. **Outperform existing methods in both static and dynamic scenes**: Perform better both quantitatively and qualitatively. ### Method Overview - **Video Representation and Notation**: Define that the coordinates of all points are in the camera coordinate system and divide the entire 3D world into a static part and a dynamic part, where the static part corresponds to the linear motion in the camera coordinate system. - **Control Signal Construction**: Define the point trajectory Tλ on the camera plane by calculating the linear translation of the 3D point area Ω captured in the first frame and projecting it onto 2D. To overcome the problem of motion suppression in the nonlinear part, the paper further models the motion of the nonlinear part (the dynamic area in the world system) and quantifies the degree of motion dynamics at time λ by the first - order derivative of time λ. - **Data Pipeline**: The paper addresses several major gaps between actual RGB video data and continuous trajectory functions, including the lack of 3D information, the lack of time correspondence, and the lack of dynamic / static partitioning. These gaps are filled by using metric depth estimation methods and tracking methods, and the static and dynamic areas are extracted in an iterative manner. - **Network Structure, Training and Inference**: The paper adopts an adaptive structure, enabling the method to be compatible with rapidly evolving base models. The network design allows the integration of control features into any diffusion process, thus adapting to various video - generation base frameworks. ### Experimental Results - **Visualization Results**: Show the effects of the method in pixel - level control and motion intensity adjustment. When the motion intensity is set to 0, the image content is almost stationary; as the motion intensity increases, the main objects in the scene begin to show motion. - **Comparative Experiments**: Compared with previous baseline methods (such as MotionCtrl and CameraCtrl), the results show that under the same experimental settings, the proposed method performs excellently in terms of control precision and motion intensity adjustment. In conclusion, through the introduction of new control signals and modeling methods, this paper significantly improves the precision and flexibility of camera control in video generation, especially when dealing with dynamic scenes.

I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength

I2VControl: Disentangled and Unified Video Motion Synthesis Control

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Unconstrained Self-Calibration of Stereo Camera on Visually Impaired Assistance Devices.

Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras

Video stabilisation based on modelling of motion imaging.

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Image Conductor: Precision Control for Interactive Video Synthesis

Robust Camera Motion Estimation in Video Sequences

COMD: Training-free Video Motion Transfer with Camera-Object Motion Disentanglement

Camera Attributes Control for Visual Odometry With Motion Blur Awareness

On a videoing control system based on object detection and tracking

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Video Stabilization for Camera Shoot in Mobile Devices via Inertial-Visual State Tracking

Trajectory Attention for Fine-grained Video Motion Control

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention