- System / Infra
- Compute & Storage
- Grid computing / Super computing
- Cloud services
- Tools
- CPU
- FPGA
- GPU
- TPU
- IPU
- Performance
- Misc
- Contributing
- serveo.net - Serveo is an SSH server just for remote port forwarding. When a user connects to Serveo, they get a public URL that anybody can use to connect to their localhost server. See link for other SSH and related alternatives, useful to be able to serve resources across devices i.e. access GPU or other hardware accelerators from another device remotely. | How to forward my local port to public using Serveo? | Serveo on GitHub
- Inlets by Alex Ellis | Get started | Video
- KnockKnock by @huggingface | tweet
- Cray Computers | Artificial Intelligence | Accel AI | Cryp-em | Autonomous Vehicles | Geospatial AI
- GraphCore's IPU
- Lambda Labs
- NGD Systems: Technology [deadlink] | Solutions - High Compute Storage, Scalable Computational Storage [deadlink] | NGD Systems: Ensuring AI Advancement with Intelligent Storage
- Grid Engine: wikipedia | Univa website | Datasheet
- BOINC - High-Throughput Computing with BOINC | Tech Docs | Download BOINC | GitHub
- Cray Computers - Supercomputing as a Service
- vast.ai - GPU Sharing Economy. One simple interface to find the best cloud GPU rentals. Reduce cloud compute costs by 3X to 5X
- paperspace - The first cloud built for the future. Powering next-generation applications and cloud ML/AI pipelines. Paperspace is built to scale with your team - pay as you go option for individuals.
- NextJournal - The notebook for reproducible research
- valohai | docs | blogs | GitHub | Videos | Showcase | Slack | @valohaiai - Valohai is a machine learning platform. It runs your experiments in the cloud, tracks your experiment history and streamlines data science workflows. DEEP LEARNING MANAGEMENT PLATFORM. Machine Orchestration, Version Control and Pipeline Management for Deep Learning.
- Lambda Cloud GPU Instances - GPU Instances for Deep Learning & Machine Learning
- NavOps - Cloud Migration for HPC | Datasheet
- Verne Global: HPC Cloud | NVIDIA DGX Ready
- Weights and Biases | Learn more about WandB
- Marvin AI: About Marvin AI | Apache Marvin AI: MLOps platform | GitHub | Video
- RealityEngine.ai | Research | Blogs
- Videos
- Notebooks
- Workshop: Unsupervised Learning and Deep Learning Based Forecasting: Anomaly Workbook | Forecasting Workbook
- AutoML Core Concepts and Hands-On Workshop: Regression Notebook | Classification Notebook
- Workshop: Large Scale Deep Learning Recommender
- Reality Engines Demo
- Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC
- Running AI Models in the Cloud: site | video | Docs | Getting started
- snakemake - The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Slides | PyPi
- plz - Plz (pronounced "please") runs your jobs storing code, input, outputs and results so that they can be queried programmatically.
- valohai | docs | blogs | GitHub | Videos | Showcase | Slack - Valohai is a machine learning platform. It runs your experiments in the cloud, tracks your experiment history and streamlines data science workflows. DEEP LEARNING MANAGEMENT PLATFORM. Machine Orchestration, Version Control and Pipeline Management for Deep Learning.
- Seldon - Model deployment platform, on kubernetes clusters. | docs | github | use-cases | blogs | videos | Seldon's opensource library for MachineLearning model inspection and interpretation
- Arize AI | docs | certification | resources | Slack - Model monitoring and observability platform. Community edition offers model performance tracing, data quality checks, explainability, and drift detection -- including embedding drift detection for CV and NLP models.
- kedro | other kedro projects | docs | Kedro-Viz | kedro-examples | Blogs | Video | gitter.im/py-sprints/kedro | pypi - Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned.
- Lambda Stack - One-line installation of TensorFlow, Keras, Caffe, Caffe, CUDA, cuDNN, and NVIDIA Drivers for Ubuntu 16.04 and 18.04.
- Apache Airflow - Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.
- Nextflow - Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.
- StackHPC suites of repositories: AI, ML, DL, Cloud, HPC | StackHPC
- cortex - Machine learning deployment platform: Deploy machine learning models to production
- #Uber introduces #Fiber, an #AI development and distributed training platform for methods including reinforcement learning and population-based learning.| Uber Open-Sources Fiber - A New Library For Distributed Machine Learning
- A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
- H2O Framework for Machine Learning
- ML Framework: Introducing Ludwig, a Code-Free Deep Learning Toolbox | Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code
- ML Pipelines
- Large SVDs Dask + CuPy + Zarr + Genomics
- Determined AI | About Niel Conway | Determined: Open-source Deep Learning Training Platform
- ML Framework by Abhishek Thakur
- See also: Data > Programs and Tools
- Probing the CPU (Linux/MacOS)
- Zero overhead performance capturing: use
/proc/interrupts
and/proc/softirqs
- Non-zero overhead, less accurate: use the PMU (capture on- and off-core events)
- Zero overhead performance capturing: use
- Probing the CPU (Windows)
- perfview - general profiling on Windows
- perfview for .net - excellent overview by Sasha Goldshtein
- Neural Magic: GPU-class performance on CPU
- Intel
- Intel® Developer Zone
- Intel® AI Developer Home Page
- Intel® AI Developer Webinar Series | All webinars listing
- The PlaidML Tensor Compiler - webinar
- nGraph - Unlocking next-generation performance with deep learning compilers: webinar | slides | homepage | github
- Intel Debug memory & threading bugs: Webinar slides | Intel inspector | | Inspector Docs | Intel® Parallel Studio XE | Intel® System Studio
- Intel Analysers/Profilers:
- Intel® DevCloud for oneAPI
- Tuning applications for multiple architectures
- Also see Intel in Courses
- TVM is an open deep learning compiler stack for CPUs, GPUs, and specialized accelerators. It aims to close the gap between the productivity-focused deep learning frameworks, and the performance- or efficiency-oriented hardware backends
Thanks to the great minds on the mechanical sympathy mailing list for their responses to my queries on CPU probing.
- Using FPGAs for Datacenter Acceleration | Windows AI | Intel® Distribution of OpenVINO™ Toolkit: Develop Multiplatform Computer Vision Solutions
- Also see FPGA in Courses
- Know your GPU
- GPU Server 1 of 2 | GPU Server 2 of 2 | Applications of GPU servers - checkout the manufacturers
- Embedded Vision Solutions for NVIDIA Jetson Series | Embedded Vision Family Brochure
- Avermedia Box PC & Carrier (works with NVIDIA Jetson): 1 | 2
- Accelerating Wide & Deep Recommender Inference on GPUs
- Create GPU Arrays and Move to DL Frameworks with DLPack
- GPU Accelerated data viz tools
- This tool is nice to monitor not only RAPIDS but also deep learning workloads
- InstaDeep™ powers AI as a Service with shared NVMe: Excelero NVMesh™ feeds unlimited streams of data to GPU-based systems with local performance for AI and ML end-users
- See NVIDIA's RAPIDS
- A NumPy-compatible array library accelerated by CUDA
- For ML Practitioners, from @NVIDIAGTC, a catalog of resources for @ApacheSpark on GPUs, using @RAPIDSai, and other #NVIDIA's Libraries (Deployment on @GCPcloud | Architectural e-Book | Use cases for Adobe & Verizon | Pipelines & Hyperparameter Tuning)
- Explore GPU Acceleration in the Intel® DevCloud (slides)
- Offload Your Code from CPU to GPU … and Optimize It
- Profile DPC++ and GPU Workload Performance
- How to harness the Powers of the Cloud TPU
- How-tos
- All tutorials
- Command-line interface
- Cloud TPU tools
- Performance Guide
- TPU Estimator API
- Using BFloat
- Advanced Guide to Inception V3 on Cloud TPU
- Examples
- Using TPUs docs
- Hello, TPU in Colab notebook
- Useful TPU and Model example
- Measure Performance on TPU, in a notebook
- Financial Time series notebook
- Web traffic prediction
- GAN example, TPU version
- XLA compiler: GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding ~ 1 Trillion parameters
- GraphCore | Videos: Simon Knowles - More complex models and more powerful machines | Graphcore tech Concept | A new kind of hardware designed for machine intelligence - GraphCore Presentations | VIDEO: SCALING THROUGHPUT PROCESSORS FOR MACHINE INTELLIGENCE
- What makes the IPU's architecture so efficient
- How do we implement large-scale #NLP models on IPU
- Graphcore are making Poplar Software Documentation publicly available
- Watch the Graphcore quick guide to the #IPU LinkedIn Post
- Dissecting the Graphcore IPU Architecture via Microbenchmarking
- Learn how to develop and train models for the Graphcore #IPU using TensorFlow
- Graphcore C2 Card performance for image-based deep learning application
- Graphcore Whitepaper: DELL DSS8440 GRAPHCORE IPU SERVER
- Product brief: IPU-MACHINE: M2000
- MOOR INSIGHTS - GRAPHCORE SOFTWARE STACK: BUILT TO SCALE
- INTELLIGENT MEMORY FOR INTELLIGENT COMPUTING
- Graphcore Benchmarks
- Graphcore GitHub
- MLPerf - Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.
- MLPerf introduces machine learning inference benchmark suite...
- ONE DEEP LEARNING BENCHMARK TO RULE THEM ALL
- mlbench: Distributed Machine Learning Benchmark - A public and reproducible collection of reference implementations and benchmark suite for distributed machine learning algorithms, frameworks and systems.
- EEMBC MLMark Benchmark - The EEMBC MLMark benchmark is a machine-learning (ML) benchmark designed to measure the performance and accuracy of embedded inference.
- DeepOBS: A Deep Learning Optimizer Benchmark Suite
- PMLB - a large benchmark suite for machine learning evaluation and comparison
- Deep Learning Benchmarking Suite | HPE Deep Learning Cookbook
- Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
- Performance profiling in TF 2 (TF Dev Summit '20)
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.
Back to main page (table of contents)