Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…

Jupyter Notebook 375 20 Updated Sep 24, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 15,705 1,128 Updated Oct 8, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,975 175 Updated Oct 4, 2024

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 31,225 3,710 Updated Nov 8, 2024

facebookresearch / sapiens

High-resolution models for human tasks.

Python 4,455 244 Updated Oct 24, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,011 44 Updated Nov 5, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,824 111 Updated Jul 29, 2024

thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 318 15 Updated Oct 8, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,395 66 Updated Sep 25, 2024

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,258 717 Updated Aug 5, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 12,187 1,114 Updated Oct 14, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,678 1,093 Updated May 23, 2024

OpenGVLab / Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Python 233 13 Updated Aug 6, 2024

zhangxulu1996 / awesome-personalization

9 1 Updated May 10, 2024

xichenpan / Kosmos-G

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 50 3 Updated May 25, 2024

csyxwei / ELITE

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)

Python 511 30 Updated Jan 8, 2024

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,707 171 Updated Oct 14, 2024

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,229 337 Updated Jun 28, 2024

instantX-research / InstantID

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,078 807 Updated Jul 18, 2024

Yangyi-Chen / SOLO

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 112 3 Updated Sep 21, 2024

Kwai-Kolors / Kolors

Kolors Team

Python 3,824 262 Updated Sep 4, 2024

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,157 189 Updated Nov 7, 2024

dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

Python 195 29 Updated Oct 18, 2024

metadriverse / SimGen

Simulator-conditioned Driving Scene Generation

59 4 Updated Jun 29, 2024

ZouHao zouhaoa

Lists (6)

BEV Perception

diffusion

🔮 Future ideas

LLM

Multimodality

radar

Starred repositories

Awesome Lists

3d-object-detection