ToonCrafter: Generative Cartoon Interpolation

ACM Transactions on Graphics (Special issue of SIGGRAPH Asia 2024)

Jinbo Xing¹, Hanyuan Liu², Menghan Xia³, Yong Zhang³, Xintao Wang³, Ying Shan³, Tien-Tsin Wong^1,4

¹The Chinese University of Hong Kong, ²City University of Hong Kong, ³Tencent AI Lab, ⁴Monash University

arXiv Code Video

Hugging Face Demo

Showcases Comparisons Applications

Sparse sketch guidance Ablation study Limitations

Showcases produced by our ToonCrafter

Comparisons with baseline methods

Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)


Input	AnimeInterp	EISAI	FILM


	SEINE	ToonCrafter (Ours)

Applications

Cartoon sketch interpolation.

Input frames	Interpolation results	Input frames	Interpolation results

Reference-based sketch colorization (single-image-reference).

Input	Colorization results	Input	Colorization results

Reference-based sketch colorization (dual-image-reference).

Input reference	Input sketch	Colorization results	Input reference	Input sketch	Colorization results

Sparse-sketch-guided generation

Bisection (n=4) (the sketch of two input cartoon frames are always given).

Input frames	Sparse sketch guidance	Interpolation results	Input frames	Sparse sketch guidance	Interpolation results

Bisection (n=3)

Input frames	Sparse sketch guidance	Interpolation results	Input frames	Sparse sketch guidance	Interpolation results

Bisection (n=2)

Input frames	Sparse sketch guidance	Interpolation results	Input frames	Sparse sketch guidance	Interpolation results

Bisection (n=1)

Input frames	Sparse sketch guidance	Interpolation results	Input frames	Sparse sketch guidance	Interpolation results

Random

Input frames	Sparse sketch guidance	Interpolation results	Input frames	Sparse sketch guidance	Interpolation results

Ablation study

Toon rectification learning.

Input	I.	II.	III.


	IV. (Ours)	V.

Dual-reference-based 3D VAE decoder (Reconstruction results, i.e., decoding the latents encoded by encoder).

Input	Ours	Ours_{w/o P3D}	Ours_{w/o HAR & P3D}

Dual-reference-based 3D VAE decoder (Generation results, i.e., decoding the denoised latents from generator).

Case 1: Please pay attention to the lanterns.
Case 2: Please pay attention to the newspaper.

Input starting frame	Input ending frame	Ours	Ours_{w/o HAR & P3D}

Sparse sketch guidance.

Input frame	Sparse sketch control (middle-frame)	ZeroGate


	FrameIn.Enc.	w/o sketch

Limitations

Our model may not correctly and semantically understand the image contents. (e.g., the black part should be the rigid body of the aircraft, which cannot sway with the wind.)

Input starting frame	Input ending frame	Our failure case

Our model may struggle to generate convincing transition motions when objects appear or disappear in the frame.

Input starting frame	Input ending frame	Our failure case