Skip to content

One framework to develop, deploy and operate data workflows with Python and SQL.

License

Notifications You must be signed in to change notification settings

Seandagan/versatile-data-kit

Repository files navigation

Versatile Data Kit Versatile Data Kit

Last Activity license pre-commit build status twitter YouTube Channel Subscribers

Overview

Versatile Data Kit (VDK) is an open source framework that enables anyone with basic SQL or Python knowledge to build, run, and manage their own data workflows.

Data processing instructions use plain text SQL or python files that are executed sequentially in alphanumeric order, allowing you to easily build your data workflows.

VDK is built for resiliency and can recover in mid-process or restart entirely from the start.

Data Journey and Versatile Data Kit

VDK creates data processing workflows to:

  • Ingest data (extract)
  • Transform data (transform)
  • Export data (load)

Data Journey Data Journey

Solve common data engineering problems

  • Ingest data from different sources, including CSV files, JSON objects, and data from REST API services.
  • Use Python/SQL and VDK templates to transform data.
  • Ensure data applications are packaged, versioned, and deployed correctly while dealing with credentials, retries, and reconnects.
  • Provide built-in monitoring and smart notification capabilities.
  • Track both code and data modifications and the relationship between them, allowing quicker troubleshooting and version rollback.

Without / With Versatile Data Kit Without / With Versatile Data Kit Without / With Versatile Data Kit code Without / With Versatile Data Kit code

Versatile Data Kit Components

  • Software Development Kit (SDK):
    • Tools to automate the extraction, transformation, and loading of data.
    • A plugin framework that allows users to extend the framework according to their specific requirements.
  • Control Service: The Control Service allows users to create, deploy, manage, and execute data jobs in a Kubernetes runtime environment.

A preview of the VDK CLI commands:

  • vdk create
  • vdk run
  • vdk deploy

    Gif displaying Versatile Data Kit commands create, run and deploy

Getting Started

Installing VDK is a simple pip command. See the Getting Started guide to install VDK and create a data job.

Next Steps

Contributing

Create an issue or pull request on GitHub to submit suggestions or changes. If you are interested in contributing as a developer, visit the contributing page.

Contacts

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.

About

One framework to develop, deploy and operate data workflows with Python and SQL.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 36.2%
  • TypeScript 27.5%
  • Java 23.7%
  • JavaScript 5.0%
  • HTML 3.9%
  • SCSS 1.4%
  • Other 2.3%