-
Tsinghua University
- Beijing, China
-
03:49
(UTC +08:00) - https://www.yuangpeng.com
- @yuang_peng
Highlights
- Pro
Stars
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
PixArt-Ī£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
The Repo is for processing the Large Video Dataset WebVid-10M, include sample, track, and so on
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
This is an implemention of llm attack on chatglm2
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
The conventional commits specification
Autoregressive Model Beats Diffusion: š¦ Llama for Scalable Image Generation
This repository contains demos I made with the Transformers library by HuggingFace.
[NeurIPS 2024 Oral][GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simā¦
Ongoing research training transformer models at scale
Open-Sora: Democratizing Efficient Video Production for All
DALLĀ·E Mini - Generate images from a text prompt
Modeling, training, eval, and inference code for OLMo
Emu Series: Generative Multimodal Models from BAAI
Official Code for Stable Cascade
Large World Model -- Modeling Text and Video with Millions Context