GitHub - Marx-wrld/Kafka-PythonDataStream: Data streaming with Python and Apache Kafka

Kafka-PythonDataStream

Turning a static data source—YouTube’s REST API—into a reactive system that:

Uses Python to fetch and process data from a static web API
Streams that data live, from Python into a Kafka topic
Processes the incoming source data with ksqlDB, watching for important changes
Then streams out live, custom notifications via Telegram

Overview

A data processing pipeline project whereby we have a python script searching the web for information, in our case we're interested in the YouTube API. Once its got the snapshot of this data, it fixes it into a Kafka stream. In which we're using Stream processing to look for changes and when the changes are interesting enough, we ship it via a Kafka connector to Telegram.

Project Installation

Clone or download the project, then get API keys from Kafka Confluent

Ensure you add the right API keys, urls, usernames and passwords to your config.py file before running.
For this project, you'll need a YouTube playlist ID of your own choice, and a google API key.
Find the ksql_db queries you'll use for this pipeline inside the ksql_db.sql file.

Accessing a YouTube playlist API ID

Go to your target YouTube playlist on the browser.
On the address bar, you will see something like this: https://www.youtube.com/watch?v=RLykC1VN7NY&list=PLFs4vir_WsTwEd-nJgVJCZPNL3HALHHpF
The playlist ID is the characters after “list=” so in the URL above, our playlist ID is PLFs4vir_WsTwEd-nJgVJCZPNL3HALHHpF
Copy the playlist ID, and paste it in the youtube_playlist_id field in your config.py file.

1. Create a virtual environment

virtualenv venv

2. Activate the virtual environment

In your command prompt:-

source venv/bin/activate - Linux
venv/Scripts/activate - Windows

3. Install the required packages

pip install -r requirements.txt

4. Run the application

python app.py

5. Deactivate the virtual environment

deactivate

NOTE:

Ensure you don't leak your API keys, urls and passwords to the web.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
ksql_db.sql		ksql_db.sql
requirements.txt		requirements.txt
youtube_watcher.py		youtube_watcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka-PythonDataStream

Overview

Project Installation

Accessing a YouTube playlist API ID

1. Create a virtual environment

2. Activate the virtual environment

3. Install the required packages

4. Run the application

5. Deactivate the virtual environment

About

Releases

Packages

Languages

License

Marx-wrld/Kafka-PythonDataStream

Folders and files

Latest commit

History

Repository files navigation

Kafka-PythonDataStream

Overview

Project Installation

Accessing a YouTube playlist API ID

1. Create a virtual environment

2. Activate the virtual environment

3. Install the required packages

4. Run the application

5. Deactivate the virtual environment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages