Skip to content

Marx-wrld/Kafka-PythonDataStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka-PythonDataStream

Turning a static data source—YouTube’s REST API—into a reactive system that:

  • Uses Python to fetch and process data from a static web API
  • Streams that data live, from Python into a Kafka topic
  • Processes the incoming source data with ksqlDB, watching for important changes
  • Then streams out live, custom notifications via Telegram

Overview

A data processing pipeline project whereby we have a python script searching the web for information, in our case we're interested in the YouTube API. Once its got the snapshot of this data, it fixes it into a Kafka stream. In which we're using Stream processing to look for changes and when the changes are interesting enough, we ship it via a Kafka connector to Telegram.

telegram_kafka

Project Installation

Clone or download the project, then get API keys from Kafka Confluent

  • Ensure you add the right API keys, urls, usernames and passwords to your config.py file before running.
  • For this project, you'll need a YouTube playlist ID of your own choice, and a google API key.
  • Find the ksql_db queries you'll use for this pipeline inside the ksql_db.sql file.

Accessing a YouTube playlist API ID

1. Create a virtual environment

virtualenv venv

2. Activate the virtual environment

In your command prompt:-

  • source venv/bin/activate - Linux
  • venv/Scripts/activate - Windows

3. Install the required packages

pip install -r requirements.txt

4. Run the application

python app.py

5. Deactivate the virtual environment

deactivate

NOTE:

  • Ensure you don't leak your API keys, urls and passwords to the web.

About

Data streaming with Python and Apache Kafka

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages