Close-up of colorful knitted fabric with intricate stitch patterns

Genmo unveils MCKY 1 AI video model setting new open-source standards

Genmo has released an impressive new AI video model called MCKY 1. This open-source model changes video creation by making character movements smoother and following prompts better. MCKY 1 is available to everyone, whether for personal or business use. Genmo also provides a free-hosted playground to try the model and has released the weights on Hugging Face.

The creators of MCKY 1 aim to unlock the "right brain" of artificial general intelligence, known for creativity. This model acts like an immersive world simulator, imagining things that might not even exist. It helps AI visualize new ideas and tell stories that were hard to create before. MCKY 1 sets a new standard for open-source video generation, competing with top closed models.

Colorful rubber band loom bracelets close-up

MCKY 1 excels in two key areas: prompt adherence and motion quality. It aligns well with user prompts, allowing detailed control over characters and settings. The model uses an automatic metric similar to OpenAI's DALL-E 3 to ensure the videos match user instructions. MCKY 1 also improves motion quality, making character movements in videos lifelike.

The model can generate videos at 30 frames per second, lasting up to 5.4 seconds. It maintains temporal coherence, meaning motions flow naturally across frames. MCKY 1 includes realistic physics simulations, like fluid dynamics and fur movement, making animations more believable. Human evaluators focus on motion quality, using criteria like realism and fluidity to rate performance.

MCKY 1 uses an advanced 10 billion parameter diffusion model based on the asymmetric diffusion transformer (ASMD). This architecture makes the model powerful and efficient. Alongside MCKY 1, Genmo is releasing a video variational autoencoder (VAE). The VAE compresses video information to a smaller size, reducing computing power needed to run MCKY 1.

The architecture processes user prompts and video tokens efficiently, with a focus on visuals. MCKY 1 handles a vast amount of video information, using learnable rotary position embeddings for coherence in three dimensions. The model benefits from AI design advancements like SwiGLU feed-forward layers and query-key normalization for stability.

The Genmo team plans to release a technical paper on MCKY 1, encouraging progress in video generation. MCKY 1 HD will be next, supporting 720p video and addressing issues like warping in complex scenes. While MCKY 1 is stunning, it has some limitations, such as generating videos at 480p. The model is optimized for photorealistic styles but does not perform well with animated content.

The community is expected to fine-tune the model for various styles. Genmo's MCKY 1 has already shown it can rival top models, offering users great quality and control in video creation.

Similar Posts