This Streamlit application demonstrates the use of AI and machine learning to automate the process of generating voiceovers for videos. Process video, generate narratives based on the video content, convert the narratives to audio, and then merge the audio back into the video.
input.mp4
AI-output.mp4
- Video Processing: Converts a video into frames using OpenCV.
- Narrative Generation: Utilizes OpenAI's GPT-4 Vision model to create stories or scripts based on the video frames.
- Voiceover Generation: Converts the generated text into a voiceover using OpenAI's text-to-speech API.
- Audio and Video Merging: Combines the generated voiceover with the original video.
- Load Environment Variables: Loads necessary API keys and configurations from
.env.local
. - Video to Frames: Converts an uploaded video into individual frames and encodes them in base64 format for processing.
- Frames to Script: Sends the video frames to OpenAI's GPT-4 Vision model to generate a narrative or script based on the content of the frames.
- Text to Audio: Converts the generated narrative into an audio file (voiceover) using OpenAI's text-to-speech service.
- Merge Audio and Video: Combines the original video with the generated voiceover to create a final product.
- Streamlit UI: Provides a user interface to upload videos, select voice options, and display the processed video.
- Set up the environment by placing the OpenAI API key in
.env.local
. - Run the Streamlit application.
- Upload a video and select a voice type.
- Generate a voiceover and view the processed video.
dotenv
: For loading environment variables.moviepy
: For video and audio processing.cv2
(OpenCV): For handling video frames.openai
: For accessing OpenAI's GPT-4 and text-to-speech APIs.requests
: For making HTTP requests to the OpenAI API.streamlit
: For creating the web-based UI.tempfile
: For handling temporary files during processing.
- An OpenAI API key is required.
- Python 3.x and the above-mentioned libraries.
This project is for demonstration purposes and shows the capabilities of integrating AI models with video and audio processing in Python.