Skip to content
View yuangpeng's full-sized avatar
šŸ¤©
A surprising multimodal large model will be released soon!
šŸ¤©
A surprising multimodal large model will be released soon!

Highlights

  • Pro

Block or report yuangpeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userā€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 121 2 Updated Oct 24, 2024
258 114 Updated Apr 25, 2022

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 555 39 Updated Oct 31, 2024

Next-Token Prediction is All You Need

Python 1,795 69 Updated Oct 24, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 5,897 500 Updated Nov 4, 2024

PixArt-Ī£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,669 83 Updated Oct 31, 2024
Python 189 4 Updated Jul 15, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,531 1,023 Updated Nov 6, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 986 55 Updated Sep 27, 2024

The Repo is for processing the Large Video Dataset WebVid-10M, include sample, track, and so on

Python 2 Updated Jul 12, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,751 113 Updated Oct 30, 2024

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Python 31 Updated Jul 1, 2024
Python 71 1 Updated Jul 7, 2024

This is an implemention of llm attack on chatglm2

Python 2 Updated Oct 30, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,827 112 Updated Jul 29, 2024

The conventional commits specification

SCSS 7,096 552 Updated Oct 23, 2024
Python 108 7 Updated Jun 6, 2024

Autoregressive Model Beats Diffusion: šŸ¦™ Llama for Scalable Image Generation

Python 1,303 56 Updated Aug 15, 2024

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 9,443 1,447 Updated Oct 21, 2024

[NeurIPS 2024 Oral][GPT beats diffusionšŸ”„] [scaling laws in visual generationšŸ“ˆ] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simā€¦

Python 4,226 315 Updated Oct 6, 2024

Ongoing research training transformer models at scale

Python 10,499 2,351 Updated Nov 9, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,182 2,168 Updated Aug 9, 2024

Grok open release

Python 49,529 8,317 Updated Aug 30, 2024

DALLĀ·E Mini - Generate images from a text prompt

Python 14,750 1,208 Updated Nov 9, 2023

Modeling, training, eval, and inference code for OLMo

Python 4,610 469 Updated Nov 9, 2024

Emu Series: Generative Multimodal Models from BAAI

Python 1,659 86 Updated Sep 27, 2024
Python 675 70 Updated Nov 4, 2024

Official Code for Stable Cascade

Jupyter Notebook 6,539 533 Updated Jul 25, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,138 551 Updated Oct 19, 2024
Next