Initial Assumptions of the Project

The system will process financial data in the form of real-time streaming data. It will consist of several modules: Apache Kafka, Apache Spark, a NoSQL database, and a data visualization tool (Power BI or Grafana). The data source will be the Finnhub.io WebSocket.

Overall Scheme of the Data Processing Pipeline

All components, except for Finnhub.io, have been containerized to simplify project development and enhance its reliability. Below is a description of each container from left to right in the diagram:

Finnhub.io
Data source, a free API providing financial data.
Data-Producer Container
Responsible for connecting via WebSocket to the Finnhub.io API, serializing the data, and sending it to the Apache Kafka broker.
Zookeeper Container
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Kafdrop Container
Used for monitoring the operation of the Apache Kafka broker.
Apache Kafka Container
A message broker receiving data from the Data-Producer and storing it until it is retrieved by Apache Spark.
Apache Spark Cluster
Consists of four containers:
- Main-Processor: The main container executing jobs to receive data from Apache Kafka, process it into Spark DataFrame, perform necessary data aggregations using PySpark, and save the results to the Apache Cassandra container.
- Spark-Master: The container managing Spark-Worker-1 and Spark-Worker-2, which perform the actual distributed data processing.
CassandraDB Container
NoSQL database Apache Cassandra consisting of one cluster (one instance) where processed data from the Apache Spark cluster is stored.
Grafana Container
Responsible for fetching data from CassandraDB and displaying it on the charts shown below.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Finnhub		Finnhub
ProcessorSpark		ProcessorSpark
cassandra		cassandra
grafana		grafana
kafka		kafka
plotly		plotly
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Initial Assumptions of the Project

Overall Scheme of the Data Processing Pipeline

About

Releases

Packages

Languages

KarolSekscinski/realtime-streaming

Folders and files

Latest commit

History

Repository files navigation

Initial Assumptions of the Project

Overall Scheme of the Data Processing Pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages