Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li,Xintao Wang,Zhaoyang Zhang,Zhouxia Wang,Ziyang Yuan,Liangbin Xie,Yuexian Zou,Ying Shan
2024-06-22
Abstract:Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at <a class="link-external link-https" href="https://liyaowei-stu.github.io/project/ImageConductor/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Multimedia
What problem does this paper attempt to address?
The paper proposes a solution to accurately control camera transformation and object motion in interactive video synthesis. Currently, although there are AI technologies for video creation, achieving precise motion control for dynamically generated video assets remains challenging. Therefore, the researchers propose the Image Conductor method, which can accurately control camera transformation and object motion from a single image. Image Conductor separates different camera and object motions using a carefully designed training strategy, utilizing camera LoRA weights and object LoRA weights. Additionally, to deal with movie-style variations with unclear trajectories, camera-guided freeform technique is introduced to enhance object motion while eliminating camera transformation. The paper also develops a trajectory-based video motion data organization process for training purposes. Existing methods either lack a refined control interface or cannot accurately control camera transformation and object motion. Image Conductor achieves independent control of camera and object motion by optimizing LoRA weights collaboratively and addresses the issue of camera transformation caused by difficult-to-distinguish multiple trajectories through the camera-guided freeform technique. Experimental results demonstrate that Image Conductor exhibits high accuracy and granularity in controlling motion for video generation from images, promoting practical applications in interactive video synthesis.