Skip to content

PeteCastle/2023-Data-Engineering-Zoomcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Zoomcamp

See the full course here

Overview

Architecture diagram

Technologies Used

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google. Only used in Week 1. For subsequent weeks, Azure is used instead due to personal lack of access.

    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Azure : Cloud platform developed by Microsoft. Used in Weeks 2-4.

    • Azure Data Lake Storage Gen2 (ADLS): Data Lake
    • Azure Databricks: Data Processing
    • Azure Synapse Analytics (Synapse): Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)

  • Docker: Containerization

  • SQL: Data Analysis & Exploration

  • Prefect: Workflow Orchestration

  • dbt: Data Transformation

  • Spark: Distributed Processing

  • Kafka: Streaming