🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
-
Updated
Dec 2, 2024 - Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Wrangler Transform: A DMD system for transforming Big Data
XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis
An SQL data cleaning project
This is a binary classification problem related with Autistic Spectrum Disorder (ASD) screening in Adult individual. Given some attributes of a person, my model can predict whether the person would have a possibility to get ASD using different Supervised Learning Techniques and Multi-Layer Perceptron.
Java DSL for (online) deduplication
This repo created for sharing the required/discussed files during Online Internship training program on Data Science Using Python in May-2021
Comprehensive Power BI dashboards showcasing insights on Call Centre Trends, Customer Retention, and Diversity & Inclusion to drive business impact.
Predict if a driver will file an insurance claim next year. (Kaggle Competition)
Make quick and dirty data mining made easier in Sublime Text
This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js
Data cleanse, clustering with Vector Quantization and Adaptive Resonance Theory
Product Rationalization of Pro Bikes Inc using Power BI
Data Structures project in C++11 language, uses custom Vector & String structures with Move Semantics (Rule of Five)
Data cleaning tool.
Add a description, image, and links to the data-cleansing topic page so that developers can more easily learn about it.
To associate your repository with the data-cleansing topic, visit your repo's landing page and select "manage topics."