Lists (1)
Sort Name ascending (A-Z)
Stars
FlashInfer: Kernel Library for LLM Serving
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
This project analyzes Tennis players in a video to measure their speed, ball shot speed and number of shots. This project will detect players and the tennis ball using YOLO and also utilizes CNNs t…
SGLang is a fast serving framework for large language models and vision language models.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
PLUTO: Push the Limit of Imitation Learning-based Planning for Autonomous Driving
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Instant voice cloning by MIT and MyShell.
A generative speech model for daily dialogue.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A minimal GPU design in Verilog to learn how GPUs work from the ground up
[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
The suite of modeling video with Mamba
Code release for ActionFormer (ECCV 2022)
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)
RTMPose series (RTMPose, DWPose, RTMO, RTMW) without mmcv, mmpose, mmdet etc.
Hiera: A fast, powerful, and simple hierarchical vision transformer.