massive-datasets

Here are 23 public repositories matching this topic...

polardb / polardbx-sql

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-database high-concurrency massive-datasets htap horizontal-scaling enterprise-class

Updated Dec 20, 2024
Java

helmholtz-analytics / heat

Star

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

python data-science machine-learning hpc gpu numpy mpi pytorch distributed parallelism data-analytics tensors data-processing multi-gpu mpi4py massive-datasets multi-node-cluster array-api

Updated Dec 20, 2024
Python

polardb / polardbx

Star

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-databases high-concurrency massive-datasets htap horizontal-scaling enterprise-class

Updated Nov 29, 2024
Makefile

joshuaboud / gen-dataset

Star

Command line tool to quickly generate a lot of files in a lot of directories

linux benchmarking evaluation multithreading dataset dataset-generation massive-datasets cli-tool dataset-generator

Updated Feb 18, 2022
C++

FedericoBruzzone / anti-money-laundering

Star

The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.

machine-learning machine-learning-algorithms pyspark massive-datasets

Updated Aug 12, 2024
Jupyter Notebook

rajeshidumalla / Bloom-Filter

Star

Building a Bloom Filter on English dictionary words

python data-science machine-learning bloom-filter data-analysis nltk-library massive-datasets

Updated Oct 7, 2021
Jupyter Notebook

FedericoBruzzone / algorithms-for-massive-datasets

Star

This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"

deep-learning algorithms recommender-system massive-datasets unimi linkanalysis

Updated Aug 12, 2024
TeX

gmalik9 / floating_point_data_compressor

Star

gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data

compression data-visualization autoencoder compressor data-compression representation representation-learning floating-point massive-datasets

Updated Sep 14, 2017
Python

rajeshidumalla / PageRank

Star

Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library

python data-science machine-learning spark numpy pagerank-algorithm pandas data-analysis massive-datasets networkx-library

Updated Oct 7, 2021
Jupyter Notebook

diem-ai / google-bigquery

Star

Series of SQL exercise working with databases, using Google BigQuery to scale to massive datasets taught by educators in Kaggle.com

python bigquery sql analytics kaggle massive-datasets

Updated Jul 9, 2019
Jupyter Notebook

Alex4gtx / Massive-Data-Handler

Star

Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.

big-data dictionaries python-script massive-datasets manipulacao-arquivos