ToonCrafter: Generative Cartoon Interpolation

ACM Transactions on Graphics (Special issue of SIGGRAPH Asia 2024)
1The Chinese University of Hong Kong,  2City University of Hong Kong,  3Tencent AI Lab,  4Monash University


Showcases produced by our ToonCrafter


Comparisons with baseline methods

Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)



Cartoon sketch interpolation.

Input frames Interpolation results Input frames Interpolation results

Reference-based sketch colorization (single-image-reference).

Input Colorization results Input Colorization results

Reference-based sketch colorization (dual-image-reference).

Input reference Input sketch Colorization results Input reference Input sketch Colorization results


Sparse-sketch-guided generation

Bisection (n=4) (the sketch of two input cartoon frames are always given).

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=3)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=2)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=1)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results


Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results


Ablation study

Toon rectification learning.

Input I. II. III.
IV. (Ours) V.

Dual-reference-based 3D VAE decoder (Reconstruction results, i.e., decoding the latents encoded by encoder).

Input Ours Oursw/o P3D Oursw/o HAR & P3D

Dual-reference-based 3D VAE decoder (Generation results, i.e., decoding the denoised latents from generator).

Case 1: Please pay attention to the lanterns.
Case 2: Please pay attention to the newspaper.

Input starting frame Input ending frame Ours Oursw/o HAR & P3D

Sparse sketch guidance.

Input frame Sparse sketch control (middle-frame) ZeroGate
FrameIn.Enc. w/o sketch



Our model may not correctly and semantically understand the image contents. (e.g., the black part should be the rigid body of the aircraft, which cannot sway with the wind.)

Input starting frame Input ending frame Our failure case

Our model may struggle to generate convincing transition motions when objects appear or disappear in the frame.

Input starting frame Input ending frame Our failure case