Skip to content

bradlensing/DataEngineering-DataModeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Modeling

Data modeling is an abstraction that organizes elements of data and how they relate to each other.

  • Its organization is extremely important and makes everyone's life easier.
  • Having the data model well thought out before hand is crucial for how data can be consumed later.
  • Goal is to have queries simple and straightforward.
  • Start early and build in flexibilty to allow for changes later.

Schemas, Fact and Dimensional Modeling

  • Fact tables
    • Record business events: orders, reviews, id's.
    • Record events in quantifiable metrics
    • Numeric & Additive format
  • Dimension tables
    • Record the context for the event: who, what, where & why
    • Dimension table columns contain the attributes, text and numeric.

Relational Databases (SQL) vs NoSQL Databases

Relational

  • Postgres, Oracle, MySQL, MSSQL ... Many more
  • A collection of tables with columns and rows.
  • Ability to do JOINs, aggregations and analytics.
  • ACID Transactions intended to guarantee validity.

NoSQL

  • Cassandra, MongoDB, Redis, DynamoDB ... Many more
  • Flexible schemas usually in Key:Value pairs.
  • Great for large amounts of data in many formats and unstructured.
  • High throughputs, fast reads.
  • Horizontal scalability, just add more nodes or machines.

Modeling with Postgres

Using python in the terminal and in Jyupter Notebooks with the psycopg2 package to connect to a local Postgres Database.

Modeling with Cassandra

I will be running cassandra in a docker container and starting either through docker desktop or in VS Code Docker interface. Using python and the cassandra package to connect and interface with the cassandra database.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published