🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

This work presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from user instructions. [📖 Paper]

💥 Planning

✅ Release the Paper
✅ Release the Model
✅ Release the Code
- Supported in the diffusers

👀 Overview

🧐 Task&Data Overview

🧐 Model Overview

🤖️ Model Zoo

Resolution	PixWizard Parameter	Text Encoder	VAE Encoder	Prediction	Download URL
512-768-1024	2B	Gemma-2B and CLIP-L-336	SD-XL	Rectified Flow	🤗hugging face

🛠️ Install

Clone this repository and navigate to PixWizard folder

git clone https://github.com/AFeng-x/PixWizard.git
cd PixWizard

nvcc Check

Before installation, ensure that you have a working nvcc

# The command should work and show the same version number as in our case. (12.1 in our case).
nvcc --version

On some outdated distros (e.g., CentOS 7), you may also want to check that a late enough version of gcc is available

# The command should work and show a version of at least 6.0.
# If not, consult distro-specific tutorials to obtain a newer version or build manually.
gcc --version

Install packages

# Create a new conda environment named 'PixWizard
conda create -n PixWizard -y
# Activate the 'sphinx-v' environment
conda activate PixWizard
# Install python and pytorch
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# Install required packages from 'requirements.txt'
pip install -r requirements.txt
# Install Flash-Attention
pip install flash-attn --no-build-isolation

🚀 Inference

run the following command:

bash exps/inference_pixwizard.sh

🔥 Training

Prepare data
- First, refer to the provided annotation_example to prepare your own training dataset.
- Second, refer to s1.yaml and s2.yaml to write your prepared annotation JSON.
Run training
- Place the downloaded weights for clip-vit-large-patch14-336 in the models/clip directory.
- Update the model paths and data path in the script then run it.

🖊️: Citation

If you find our project useful for your research and applications, please kindly cite using this BibTeX:

@article{lin2024pixwizard,
  title={PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions},
  author={Lin, Weifeng and Wei, Xinyu and Zhang, Renrui and Zhuo, Le and Zhao, Shitian and Huang, Siyuan and Xie, Junlin and Qiao, Yu and Gao, Peng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2409.15278},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets/figures		assets/figures
configs/data		configs/data
data		data
exps		exps
infer_examples		infer_examples
models		models
prompts		prompts
transport		transport
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
demo.py		demo.py
entry_point.py		entry_point.py
grad_norm.py		grad_norm.py
imgproc.py		imgproc.py
parallel.py		parallel.py
requirements.txt		requirements.txt
sample_pixwizard.py		sample_pixwizard.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

💥 Planning

👀 Overview

🧐 Task&Data Overview

🧐 Model Overview

🤖️ Model Zoo

🛠️ Install

🚀 Inference

🔥 Training

🖊️: Citation

About

Releases

Packages

Languages

License

AFeng-x/PixWizard

Folders and files

Latest commit

History

Repository files navigation

🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

💥 Planning

👀 Overview

🧐 Task&Data Overview

🧐 Model Overview

🤖️ Model Zoo

🛠️ Install

🚀 Inference

🔥 Training

🖊️: Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages