Projects done in the Data Engineer Nanodegree Program by Udacity.com
-
Updated
Dec 8, 2022 - Jupyter Notebook
Projects done in the Data Engineer Nanodegree Program by Udacity.com
Documentation for Getting Up and Running w/ indexed.xyz Data
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
A Semantic Data Reservoir for Heterogeneous Datasets
Codebase and data for our paper - Pylon: Semantic Table Union Search in Data Lakes.
A Search Join is a join operation which extends a user-provided table with additional attributes based on a large corpus of heterogeneous data originating from the Web or corporate intranets.
Discussion of DTF software architecture Repository
Follow along with materials in the book "Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses and data lakes" (Lipp, 2023)
The project aims to create a data architecture that connects marketing, logistics, and inventory, automating order processing and report generation across departments for an industrial and medical gas company. The system will streamline data flow and enhance communication across departments.
Data Engineering Nanodegree Program
Add a description, image, and links to the data-lakes topic page so that developers can more easily learn about it.
To associate your repository with the data-lakes topic, visit your repo's landing page and select "manage topics."