AnimateAnything: Consistent and Controllable Animation for Video Generation

Guojun Lei,Chi Wang,Hong Li,Rong Zhang,Yikai Wang,Weiwei Xu
2024-11-17
Abstract:We present a unified controllable video generation approach AnimateAnything that facilitates precise and consistent video manipulation across various conditions, including camera trajectories, text prompts, and user motion annotations. Specifically, we carefully design a multi-scale control feature fusion network to construct a common motion representation for different conditions. It explicitly converts all control information into frame-by-frame optical flows. Then we incorporate the optical flows as motion priors to guide final video generation. In addition, to reduce the flickering issues caused by large-scale motion, we propose a frequency-based stabilization module. It can enhance temporal coherence by ensuring the video's frequency domain consistency. Experiments demonstrate that our method outperforms the state-of-the-art approaches. For more details and videos, please refer to the webpage: <a class="link-external link-https" href="https://yu-shaonian.github.io/Animate_Anything/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve precise and consistent control of different control signals (such as camera trajectories, text prompts, and user motion annotations) in video generation. Specifically, existing methods have difficulty achieving precise control when dealing with large - scale camera and object movements. Especially when dealing with multiple control signals simultaneously, problems such as control signal conflicts, video flickering, or incoherence are likely to occur. The paper proposes a method named AnimateAnything. By uniformly converting various control signals into inter - frame optical flow and introducing a frequency stabilization module during the generation process to enhance the temporal consistency of the video, these problems are solved. This method not only improves the quality of video generation but also makes the generation process more flexible and controllable. The main contributions of the paper include: - Proposing a two - stage video generation framework. In the first stage, all control signals are uniformly converted into inter - frame optical flow, and in the second stage, these optical flows are used to guide the final video generation. - Introducing an adaptive feature refinement technique to suppress instability and flickering in the generated video by modifying the temporal features of the video. - Verifying the superiority of this method quantitatively and qualitatively through extensive experiments. Compared with existing methods, the quality of the generated video has been significantly improved. This method has important application value in the field of video generation, especially in fields such as film production and virtual reality, and can provide higher - quality and more controllable video content.