MIMO
Advanced video-to-video model by Alibaba
2024-09-26

MIMO, a generalizable model for controllable video synthesis, can Mimic anyone anywhere in complex Motions with Object interactions
MIMO, developed by Alibaba’s Institute for Intelligent Computing, is an advanced video synthesis model designed to create realistic, controllable character videos in complex, interactive scenes. Unlike traditional methods that require multi-view captures or struggle with pose and scene interactions, MIMO leverages a novel spatial decomposition approach. It encodes 2D video frames into 3D spatial codes, separating components like characters, motion, and scenes for precise control. Users can input a single image, pose sequence, or scene video to generate lifelike animations with customizable attributes. MIMO’s framework combines hierarchical 3D depth analysis with diffusion-based decoding, enabling scalable, generalizable, and interactive video synthesis for diverse real-world applications. This breakthrough offers a unified solution for creating dynamic, high-quality character videos with minimal input.
Artificial Intelligence
GitHub