Turning a static data source—YouTube’s REST API—into a reactive system that:
- Uses Python to fetch and process data from a static web API
- Streams that data live, from Python into a Kafka topic
- Processes the incoming source data with ksqlDB, watching for important changes
- Then streams out live, custom notifications via Telegram
A data processing pipeline project whereby we have a python script searching the web for information, in our case we're interested in the YouTube API. Once its got the snapshot of this data, it fixes it into a Kafka stream. In which we're using Stream processing to look for changes and when the changes are interesting enough, we ship it via a Kafka connector to Telegram.
Clone or download the project, then get API keys from Kafka Confluent
- Ensure you add the right API keys, urls, usernames and passwords to your config.py file before running.
- For this project, you'll need a YouTube playlist ID of your own choice, and a google API key.
- Find the ksql_db queries you'll use for this pipeline inside the ksql_db.sql file.
- Go to your target YouTube playlist on the browser.
- On the address bar, you will see something like this: https://www.youtube.com/watch?v=RLykC1VN7NY&list=PLFs4vir_WsTwEd-nJgVJCZPNL3HALHHpF
- The playlist ID is the characters after “list=” so in the URL above, our playlist ID is PLFs4vir_WsTwEd-nJgVJCZPNL3HALHHpF
- Copy the playlist ID, and paste it in the
youtube_playlist_id
field in your config.py file.
virtualenv venv
In your command prompt:-
- source venv/bin/activate - Linux
- venv/Scripts/activate - Windows
pip install -r requirements.txt
python app.py
deactivate
NOTE:
- Ensure you don't leak your API keys, urls and passwords to the web.