Movie Gen

Movie Gen is a collection of cutting-edge foundation models developed by the Movie Gen team at Meta1. These models are designed to generate high-quality 1080p HD videos with various aspect ratios and synchronized audio. Movie Gen excels at a range of tasks, including:

Text-to-video synthesis
Video personalization
Precise video editing based on user instructions
Video-to-audio generation
Text-to-audio generation

Their models set a new state-of-the-art in multiple video and audio generation domains and aim to push the boundaries of what's possible in media creation. The most powerful model in Their collection is a 30 billion parameter transformer, capable of generating videos up to 16 seconds long at 16 frames per second (fps). The model operates with a maximum context length of 73K video tokens, allowing for highly detailed and complex media output.

Key Features

HD Video Generation: Outputs high-quality 1080p videos with various aspect ratios and synchronized audio.
Text-to-Video Synthesis: Generates fully realized videos from natural language descriptions.
Personalized Video Creation: Tailors videos based on user-supplied images or inputs.
Instruction-Based Video Editing: Allows precise control and editing of video content through instructions.
Audio Synthesis: Generates audio based on video content and natural language descriptions.
Scaling & Efficiency: Achieves high scalability through technical innovations in parallelization, architecture simplifications, and efficient data curation.

Model Overview

Model	Parameters	Capabilities	Max Context Length	FPS
Movie Gen Base	5B	Text-to-Video, Video-to-Audio	18K video tokens	16
Movie Gen Pro	15B	Personalized Video, Text-to-Video	40K video tokens	16
Movie Gen Max (State-of-the-art)	30B	Full-featured Video & Audio Generation	73K video tokens	16

Technical Innovations

Architecture Simplifications: They introduced several architectural simplifications to scale media generation models effectively. These include novel transformer-based structures tailored for handling video data.
Latent Spaces & Training Objectives: By refining latent spaces and optimizing training objectives, Their models can generate realistic, coherent, and high-quality outputs across multiple media modalities.
Data Curation: They built a highly curated, diverse dataset specifically designed for multi-modal media generation tasks.
Parallelization Techniques: Their models leverage advanced parallelization techniques, enabling faster training and inference.
Inference Optimizations: They implemented optimizations that significantly reduce latency during inference, making real-time video generation and editing feasible.

Installation

To use Movie Gen, clone the repository and install the necessary dependencies:

git clone https://github.com/kyegomez/movie-gen.git
cd movie-gen
pip install -r requirements.txt

Usage

`TemporalAutoencoder` or `TAE`

import torch
from loguru import logger
from movie_gen.tae import TemporalAutoencoder

def test_temporal_autoencoder():
    """
    Test the TemporalAutoencoder model with a dummy input tensor.
    This function creates a random input tensor representing a batch of videos,
    passes it through the model, and prints out the input and output shapes.
    """
    # Set the logger to display debug messages
    logger.add(lambda msg: print(msg, end=''))

    # Instantiate the model
    model = TemporalAutoencoder(in_channels=3, latent_channels=16)

    # Create a dummy input tensor representing a batch of videos
    # Batch size B=2, T0=16 frames, 3 channels (RGB), H0=64, W0=64
    B, T0, C_in, H0, W0 = 1, 16, 3, 64, 64
    x = torch.randn(B, T0, C_in, H0, W0)

    # Forward pass through the model
    recon = model(x)

    # Print the shapes
    print(f"Input shape: {x.shape}")
    print(f"Reconstructed output shape: {recon.shape}")

if __name__ == "__main__":
    test_temporal_autoencoder()

Evaluation

Their models have been rigorously evaluated on multiple tasks, including:

Text-to-video generation benchmarks.
Video personalization accuracy.
Instruction-based video editing precision.
Audio generation quality.

Reproducing Their Results

To reproduce the evaluation metrics in Their paper, use the following command:

python evaluate.py --model movie-gen-max --task text-to-video

Contributing

They welcome contributions! Please follow the standard GitHub flow:

Fork the repository
Create a new feature branch (git checkout -b feature-branch)
Make your changes
Submit a pull request

For a list of core contributors, please refer to the appendix of the Movie Gen Paper.

License

Movie Gen is licensed under the MIT License. See LICENSE for more information.

Contact

For any questions or collaboration opportunities, please reach out to the Movie Gen team at:

Email: http://agoralab.ai
Website: http://agoralab.ai

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
movie_gen		movie_gen
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Gen

Key Features

Model Overview

Technical Innovations

Installation

Usage

`TemporalAutoencoder` or `TAE`

Evaluation

Reproducing Their Results

Contributing

License

Contact

About

Releases

Sponsor this project

Packages

Languages

License

kyegomez/movie-gen

Folders and files

Latest commit

History

Repository files navigation

Movie Gen

Key Features

Model Overview

Technical Innovations

Installation

Usage

TemporalAutoencoder or TAE

Evaluation

Reproducing Their Results

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

`TemporalAutoencoder` or `TAE`

Packages