Wan2.1 is a cutting-edge suite of video foundation models designed to revolutionize video generation. It offers state-of-the-art performance across multiple benchmarks, surpassing both open-source and commercial solutions. The models are optimized for consumer-grade GPUs, with the T2V-1.3B model requiring only 8.19 GB VRAM, making it accessible to a wide range of users. Wan2.1 supports various tasks, including Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, making it a versatile tool for content creators.
Key features of Wan2.1 include:
- SOTA Performance: Consistently outperforms existing models in benchmarks.
- Consumer-Grade GPU Support: Efficient VRAM usage allows for deployment on consumer hardware.
- Multiple Tasks: Supports a wide range of video generation tasks.
- Visual Text Generation: Capable of generating both Chinese and English text within videos.
- Powerful Video VAE: Wan-VAE ensures high efficiency and performance, handling 1080P videos of any length.
Wan2.1 is built on a diffusion transformer paradigm, incorporating innovations like a novel spatio-temporal VAE, scalable training strategies, and large-scale data construction. The model is licensed under Apache 2.0, allowing for flexible usage while ensuring compliance with ethical guidelines. Join the community on Discord or WeChat to connect with the research and product teams.