This service tries to retrieve all text content from a video. The image part of the video is retrieved by splitting it in several image frames and then making an OCR (using Tika). The audio transcription part is obtained by using another service (like AudioTranscriberAPI) that receives an audio and returns the text transcription.
You will need to set the environment variables described below. After that, follow the steps:
Install packages
pip install -r requirements
Go to Flask API folder
cd ./flaskapp
Start Flask server (e.g. http://localhost:3860)
flask run -h localhost -p 3680
To create a docker image, build it with:
docker build -t videotranscriptionocr .
Then run it port-forwarding the required port
docker run -p 3680:3680 -e TIKA_SERVER="TIKA_SERVER_HOST" \
-e TRANSCRIBE_SERVER="TRANSCRIPTION_SERVER_HOST" \
--network="host" videotranscriptionocr
For testing, it's recommended to use an API tool like Postman.
On Headers: Include the key Content-Type with value application/json as we will send the base64 audio data using a JSON format.
In Body: Create a JSON where the data key has the base64 mp4 data, for example:
{
"data": "BASE64DATA"
}
Finally on URL field, select the POST method and send the JSON to the following address: http://localhost:3860/extract_video
If successful, it will return a JSON with code 200 and the following data:
{
"code": 200,
"data": {
"video_ocr": "OCR_TEXT_EXTRACTED_FROM_VIDEO",
"audio_transcription": "TRANSCRIPTED_TEXT_FROM_VIDEO_AUDIO",
"audiob64": "BASE64_AUDIO_DATA"
}
}
- TIKA_SERVER: Tika text extractor host (example: https://hub.docker.com/r/apache/tika)
- TRANSCRIBE_SERVER: Transcriber service host (example: https://github.com/rmazzine/AudioTranscriberAPI)