data-processing

Here are 1,289 public repositories matching this topic...

onceupon / Bash-Oneliner

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

linux shell bash terminal system hardware grep data-processing variables xargs xwindow one-liners linux-administration oneliner-commands shell-oneliner

Updated Aug 29, 2024

johnkerl / miller

Star

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated Dec 23, 2024
Go

pathwaycom / pathway

Star

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

python rust streaming real-time kafka etl machine-learning-algorithms stream-processing data-analytics dataflow data-processing data-pipelines batch-processing pathway iot-analytics etl-framework time-series-analysis

Updated Dec 24, 2024
Python

TomWright / dasel

Sponsor

Star

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

config go cli golang yaml toml parser json query xml configuration update selector data-structures data-wrangling devops-tools data-processing yaml-processor json-processing

Updated Dec 23, 2024
Go

NVIDIA / DALI

Star

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

python machine-learning deep-learning neural-network mxnet gpu image-processing pytorch gpu-tensorflow data-processing data-augmentation audio-processing paddle image-augmentation fast-data-pipeline

Updated Dec 20, 2024
C++

unionai-oss / pandera

Star

A light-weight, flexible, and expressive statistical data testing library

testing schema validation data-validation pandas-dataframe assertions pandas testing-tools data-processing dataframes data-cleaning hypothesis-testing data-verification pandas-validation data-check data-assertions dataframe-schema pandas-validator

Updated Dec 23, 2024
Python

dashbitco / broadway

Star

Concurrent and multi-stage data ingestion and data processing with Elixir

elixir broadway concurrent data-processing genstage data-ingestion

Updated Dec 22, 2024
Elixir

asyml / texar

Star

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

python machine-learning natural-language-processing deep-learning tensorflow machine-translation text-generation data-processing bert text-data dialog-systems gpt-2 texar xlnet casl-project

Updated Aug 26, 2021
Python

microsoft / DialoGPT

Star

Large-scale pretraining for dialogue

machine-learning dialogue text-generation pytorch transformer data-processing text-data gpt-2 dialogpt

Updated Oct 17, 2022
Python

numaproj / numaflow

Star

Kubernetes-native platform to run massively parallel data/streaming jobs

kubernetes pipeline stream-processing map-reduce k8s data-processing hacktoberfest

Updated Dec 24, 2024
Go

bytewax / bytewax

Star

Python Stream Processing

python rust data-science machine-learning stream-processing data-engineering dataflow data-processing streaming-data

Updated Dec 10, 2024
Python

python-bonobo / bonobo

Star

Extract Transform Load for Python 3.5+

automation parallelization python3 data-processing bonobo extract-transform-load

Updated May 12, 2023
Python

GoogleCloudPlatform / data-science-on-gcp

Star

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated May 1, 2024
Jupyter Notebook

allenai / dolma

Star

Data and tools for generating and inspecting OLMo pre-training data.

nlp data-processing machile-learning large-language-models llm

Updated Dec 24, 2024
Python

microsoft / GODEL

Star

Large-scale pretrained models for goal-directed dialog

machine-learning dialogue transformers text-generation pytorch transformer data-processing language-model dialogue-systems text-data conversational-ai language-grounding pretrained-model dialogpt grounded-generation

Updated Dec 10, 2023
Python

GoogleCloudPlatform / DataflowJavaSDK

Star

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

data-science data-mining big-data data-analysis google-cloud-dataflow data-processing

Updated Nov 25, 2020

jofpin / synthBTC

Star

A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.

nodejs bitcoin prediction monte-carlo-simulation data-processing synthetic-data turbit

Updated Aug 4, 2024
JavaScript

asyml / texar-pytorch

Star

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

python machine-learning natural-language-processing deep-learning machine-translation text-generation pytorch data-processing bert text-data dialog-systems roberta gpt-2 texar xlnet casl-project texar-pytorch

Updated Apr 14, 2022
Python

hstreamdb / hstream

Star

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

iot distributed-systems haskell streaming real-time sql database kafka scale stream-processing distributed-database realtime-database data-processing financial-analysis streaming-data materialized-view iot-database hstreamdb streaming-database

Updated Sep 27, 2024
Haskell

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Updated Apr 20, 2024
Pascal

Improve this page

Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-processing

Here are 1,289 public repositories matching this topic...

onceupon / Bash-Oneliner

johnkerl / miller

pathwaycom / pathway

TomWright / dasel

NVIDIA / DALI

unionai-oss / pandera

dashbitco / broadway

asyml / texar

microsoft / DialoGPT

numaproj / numaflow

bytewax / bytewax

python-bonobo / bonobo

GoogleCloudPlatform / data-science-on-gcp

allenai / dolma

microsoft / GODEL

GoogleCloudPlatform / DataflowJavaSDK

jofpin / synthBTC

asyml / texar-pytorch

hstreamdb / hstream

benibela / xidel

Improve this page

Add this topic to your repo