ToonCrafter: Generative Cartoon Interpolation

1The Chinese University of Hong Kong,  2City University of Hong Kong,  3Tencent AI Lab

 


Showcases produced by our ToonCrafter


 


Comparisons with baseline methods

Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)
Input AnimeInterp EISAI FILM
SEINE ToonCrafter (Ours)

 


Applications

Cartoon sketch interpolation.

Input frames Interpolation results Input frames Interpolation results

Reference-based sketch colorization (single-image-reference).

Input Colorization results Input Colorization results

Reference-based sketch colorization (dual-image-reference).

Input reference Input sketch Colorization results Input reference Input sketch Colorization results

 


Sparse-sketch-guided generation

Bisection (n=4) (the sketch of two input cartoon frames are always given).

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=3)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=2)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Bisection (n=1)

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

Random

Input frames Sparse sketch guidance Interpolation results Input frames Sparse sketch guidance Interpolation results

 


Ablation study

Toon rectification learning.

Input I. II. III.
IV. (Ours) V.

Dual-reference-based 3D VAE decoder (Reconstruction results, i.e., decoding the latents encoded by encoder).

Input Ours Oursw/o P3D Oursw/o HAR & P3D

Dual-reference-based 3D VAE decoder (Generation results, i.e., decoding the denoised latents from generator).

Case 1: Please pay attention to the lanterns.
Case 2: Please pay attention to the newspaper.

Input starting frame Input ending frame Ours Oursw/o HAR & P3D

Sparse sketch guidance.

Input frame Sparse sketch control (middle-frame) ZeroGate
FrameIn.Enc. w/o sketch

 


Limitations

Our model may not correctly and semantically understand the image contents. (e.g., the black part should be the rigid body of the aircraft, which cannot sway with the wind.)

Input starting frame Input ending frame Our failure case

Our model may struggle to generate convincing transition motions when objects appear or disappear in the frame.

Input starting frame Input ending frame Our failure case