- London
Highlights
- Pro
Lists (8)
Sort Name ascending (A-Z)
Stars
Making large AI models cheaper, faster and more accessible
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
LLaQo, a Large Language Query-based Coach in the domain of expressive performance
A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
haoheliu / fid-metrics
Forked from npurson/fid-metricsA toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
WaveGAN on GTZAN Music genre classification dataset
JinhuaLiang / bigvgan
Forked from NVIDIA/BigVGANOfficial PyTorch implementation of BigVGAN (ICLR 2023)
OmniTokenizer: one model and one weight for image-video joint tokenization.
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Official implementation of Diffusion Autoencoders
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
A family of diffusion models for text-to-audio generation.
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Annotated Flow Matching paper
[NeurIPS 24] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
Open-Sora: Democratizing Efficient Video Production for All
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Python packaging and dependency management made easy
PyTorch implementation of Tacotron speech synthesis model.
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch