绿色检查
链接复制到剪贴板

Meta Movie Gen: Reimagining Content Creation

See how Meta Movie Gen is redefining video and sound creation. Learn how this model offers precise video editing and supports personalized media creation.

Whether you’re an aspiring filmmaker or a content creator who enjoys making videos for your audience, having AI tools that expand your creativity is always helpful. Recently, Meta launched its latest generative video model, known as Meta Movie Gen. 

The global generative AI market in media and entertainment is projected to reach $11.57 billion by 2033, with companies like Runway, OpenAI, and Meta leading the way in groundbreaking innovations. Meta Movie Gen, in particular, is great for applications like filmmaking, video content creation, and digital storytelling, making it easier than ever to bring creative visions to life through high-quality, AI-generated videos. In this article, we’ll explore Meta Movie Gen and how it works. We’ll also take a closer look at some of its applications. Let’s get started!

Fig 1. A frame of a video clip generated using Meta Movie Gen.

What is Meta Movie Gen?

Before we discuss what Meta Movie Gen is, let’s take a look at how it came to be. 

Meta’s research efforts related to generative AI started with their Make-A-Scene series of models. This research focuses on a multimodal generative AI method that helps artists and visionaries bring their imagination to life. Artists can input images, audio, videos, or 3D animations to get the image output that they desire. The next leap in innovation came with diffusion models like the Llama Image Foundation models (Emu), which made it possible to generate images and videos of much higher quality and enabled image editing.

Fig 2. An example of using Make-A-Scene’s sketch and text input to get a generated image.

Movie Gen is Meta’s latest contribution to generative AI research. It combines all of the previously mentioned modalities and allows further fine-grained control so that people can use the models in more creative ways. Meta Movie Gen is a collection of foundational models for generating different types of media, including text-to-video, text-to-audio, and text-to-image. It consists of four models, which are trained on a combination of licensed and publicly available datasets

Here’s a quick overview of these models:

  • Movie Gen Video model: A 30-billion-parameter model that generates high-quality videos from text prompts. 
  • Movie Gen Audio model: A 13-billion-parameter model that can create soundtracks that sync with video content. 
  • Personalized Movie Gen Video model: It generates videos of specific individuals based on a text prompt and a single image, retaining their likeness. 
  • Movie Gen Edit model: The model allows detailed, text-based video edits for real and fictional videos. 

Training the Meta Movie Gen Video Model

Several key processes were involved in creating and training the Movie Gen Video model. The first step involved collecting and preparing visual data, including images and video clips, primarily of human activities filtered for quality, motion, and relevance. The data was then paired with text captions that explained what was happening within each scene. The captions, generated using Meta’s LLaMa3-Video model, provided rich details about the content of each scene, enhancing the model’s visual storytelling capabilities.

Fig 3.  An overview of the Movie Gen Video model’s pre-training data curation pipeline.

The training process began with the model learning to transform text into low-resolution images. It then progressed to creating full video clips through a combination of text-to-image and text-to-video training, using increasingly high-quality visuals. 

A tool called the Temporal Autoencoder (TAE) compressed the videos to manage large volumes of data efficiently. Fine-tuning further sharpened the video quality, and a method called model averaging (it combines multiple model outputs for smoother, more consistent results) ensured greater output consistency. Finally, the video, initially at 768p, was upscaled to a sharp 1080p resolution using a spatial upsampler technique, which increases image resolution by adding pixel data for clearer visuals. The result was high-quality, detailed video outputs.

Exploring Meta Movie Gen Capabilities

The Meta Movie Gen models primarily support four different abilities. Let’s take a closer look at each of them.

Video and Audio Generation

Meta Movie Gen can generate high-quality videos. These video clips can be up to 16 seconds long and run at 16 fps (frames per second), creating realistic visuals that capture motion, interactions, and camera angles from text prompts. Paired with the 13-billion-parameter audio model, it can produce synced audio, including ambient sounds, Foley effects, and music, to match the visuals. 

This setup ensures a seamless, lifelike experience, where both visuals and audio stay aligned and realistic across various scenes and prompts. For instance, these models were used to create video clips of the viral pigmy hippopotamus of Thailand, named Moo Deng.

Fig 4. A frame of a video clip of Moo Deng made using Meta’s Movie Gen.

Personalized Video Generation

Another interesting capability of the Meta Movie Gen model is personalized video generation. Users can provide a person’s image and a text prompt describing how the video clip should be generated, resulting in a video that includes the reference person and incorporates the rich visual details specified in the text prompt. The model uses both inputs (image and text) to keep the person’s unique appearance and natural body movements, while accurately following the scene described in the prompt.

Fig 5. An example of the model's personalized video generation capability.

Precise Video Editing 

Using the Movie Gen Edit model, users can provide both a video clip and a text prompt as input to edit the video in creative ways. The model combines video generation with advanced image editing to perform very specific edits, such as adding, removing, or replacing elements. It can also perform global changes like modifying the background of the video clip or the overall style. But what makes the model truly unique is its precision: it can target only the specific pixels that require editing and leave the rest untouched. This preserves the original content as much as possible. 

Fig 6. Various examples of the Movie Gen Edit model’s video editing capabilities.

Meta Movie Gen’s Benchmarking Tools

Along with the generative AI models, Meta also introduced Movie Gen Bench, a suite of benchmarking tools for testing the performance of generative AI models. It comes with two main tools: Movie Gen Video Bench and Movie Gen Audio Bench. Both are designed to test different aspects of video and audio generation.

Here’s a glimpse of both tools:

  • Movie Gen Video Bench: It consists of 1003 prompts covering a wide variety of testing categories such as human activities, animals, natural scenery, physics, as well as unusual subjects and activities. What makes this evaluation benchmark especially valuable is its coverage of motion levels, which ensures that the video generation model is tested for both fast-paced and slower sequences.
  • Movie Gen Audio Bench: It is designed to test audio generation capabilities across 527 prompts. These prompts are paired with generated videos to assess how well the model can synchronize sound effects and music with visual content.
Fig 7. The diagram shows a breakdown of the evaluation prompts, with a list of concepts on the left and a word cloud of commonly used nouns and verbs on the right.

A Practical Application of Meta Movie Gen

Now that we’ve covered what the Meta Movie Gen models are and how they work, let’s explore one of their practical applications. 

Movie Gen AI Innovations in Filmmaking

One of the most exciting uses of Meta's Movie Gen is how it can transform filmmaking through AI-powered video and audio creation. With Movie Gen, creators can generate high-quality visuals and sounds from simple text prompts, opening up new ways to tell stories. 

In fact, Meta teamed up with Blumhouse and a group of filmmakers, gathering their feedback on how Movie Gen can best support the creative process. Filmmakers like Aneesh Chaganty, the Spurlock Sisters, and Casey Affleck tested the tool's ability to capture mood, tone, and visual direction. They discovered that the models helped spark fresh ideas.

This pilot program has shown that while Movie Gen doesn’t replace traditional filmmaking, it offers directors a new way to experiment with visual and audio elements quickly and creatively. The filmmakers also appreciated how the tool’s editing features let them play with background sounds, effects, and visual styles more freely. 

Fig 8. A frame of a short film created using Meta Movie Gen.

主要收获

Meta Movie Gen is a step forward in using generative AI to make high-quality videos and sounds from simple text descriptions. The tool helps users easily create realistic and custom videos. With capabilities like precise video editing and personalized media generation, Meta Movie Gen offers a flexible toolset that opens up fresh possibilities for storytelling, filmmaking, and beyond. By making it easier to create detailed and useful visuals, Meta Movie Gen is transforming how videos are made and used across different fields and setting a new standard for AI-driven content creation.

To learn more, visit our GitHub repository, and engage with our community. Explore AI applications in self-driving cars and agriculture on our solutions pages. 🚀

Facebook 徽标Twitter 徽标LinkedIn 徽标复制链接符号

在此类别中阅读更多内容

让我们共同打造人工智能的未来

开始您的未来机器学习之旅