Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…

Jupyter Notebook 360 20 Updated Sep 24, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 14,461 1,039 Updated Oct 3, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,458 138 Updated Oct 4, 2024

rasbt / LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 28,237 3,220 Updated Oct 5, 2024

facebookresearch / sapiens

High-resolution models for human tasks.

Python 4,128 218 Updated Oct 3, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 882 39 Updated Sep 30, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,779 108 Updated Jul 29, 2024

thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 303 15 Updated Sep 18, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,299 60 Updated Sep 25, 2024

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,213 715 Updated Aug 5, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,329 970 Updated Oct 5, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,234 1,062 Updated May 23, 2024

OpenGVLab / Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Python 222 13 Updated Aug 6, 2024

zhangxulu1996 / awesome-personalization

8 1 Updated May 10, 2024

xichenpan / Kosmos-G

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 46 3 Updated May 25, 2024

csyxwei / ELITE

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)

Python 511 30 Updated Jan 8, 2024

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,597 165 Updated Sep 29, 2024

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,085 331 Updated Jun 28, 2024

instantX-research / InstantID

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 10,959 800 Updated Jul 18, 2024

Yangyi-Chen / SOLO

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 106 2 Updated Sep 21, 2024

Kwai-Kolors / Kolors

Kolors Team

Python 3,666 242 Updated Sep 4, 2024

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,103 187 Updated Aug 20, 2024

dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

Python 178 26 Updated Sep 2, 2024

metadriverse / SimGen

Simulator-conditioned Driving Scene Generation

48 3 Updated Jun 29, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,705 112 Updated Sep 19, 2024

SkalskiP / top-cvpr-2024-papers

This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]

Python 636 58 Updated Jun 24, 2024

ZouHao zouhaoa

Lists (6)

BEV Perception

diffusion

🔮 Future ideas

LLM

Multimodality

radar

Starred repositories

Awesome Lists

3d-object-detection