Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Zhongjie Duan,Chengyu Wang,Cen Chen,Weining Qian,Jun Huang
2024-01-29
Abstract:Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis methodologies, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, extant methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems: stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called \textit{Diffutoon}. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the content according to prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github (Project page:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily aims to address the following issues: 1. **Toon Shading**: - Most current methods struggle to maintain consistency and high visual quality when stylizing videos. - The paper breaks down the toon shading problem into four sub-problems: stylization, consistency enhancement, structure guidance, and colorization. 2. **Challenges in Video Processing Based on Diffusion Models**: - Diffusion models lack controllability when applied to videos, making it difficult to retain the structure and lighting information of the original video. - Independently processing each frame leads to video flickering issues. - High-resolution video processing is challenging, and most models can only handle a limited number of consecutive frames. 3. **Proposed Method (Diffutoon)**: - By combining multiple modules (such as ControlNet and AnimateDiff), the proposed method addresses the above challenges, achieving high-quality, high-resolution, and long-duration animated style video generation. - It provides an additional editing branch, allowing video content to be edited based on prompts. In summary, the paper aims to directly convert real videos into animated styles using diffusion models, while enhancing visual quality and maintaining video consistency.