This work presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from user instructions. [📖 Paper]
- ✅ Release the Paper
- ✅ Release the Model
- ✅ Release the Code
-
- Supported in the diffusers
Resolution | PixWizard Parameter | Text Encoder | VAE Encoder | Prediction | Download URL |
---|---|---|---|---|---|
512-768-1024 | 2B | Gemma-2B and CLIP-L-336 | SD-XL | Rectified Flow | 🤗hugging face |
- Clone this repository and navigate to PixWizard folder
git clone https://github.com/AFeng-x/PixWizard.git
cd PixWizard
- nvcc Check
Before installation, ensure that you have a working nvcc
# The command should work and show the same version number as in our case. (12.1 in our case).
nvcc --version
On some outdated distros (e.g., CentOS 7), you may also want to check that a late enough version of
gcc
is available
# The command should work and show a version of at least 6.0.
# If not, consult distro-specific tutorials to obtain a newer version or build manually.
gcc --version
- Install packages
# Create a new conda environment named 'PixWizard
conda create -n PixWizard -y
# Activate the 'sphinx-v' environment
conda activate PixWizard
# Install python and pytorch
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# Install required packages from 'requirements.txt'
pip install -r requirements.txt
# Install Flash-Attention
pip install flash-attn --no-build-isolation
run the following command:
bash exps/inference_pixwizard.sh
-
Prepare data
- First, refer to the provided annotation_example to prepare your own training dataset.
- Second, refer to s1.yaml and s2.yaml to write your prepared annotation JSON.
-
Run training
- Place the downloaded weights for clip-vit-large-patch14-336 in the models/clip directory.
- Update the model paths and data path in the script then run it.
If you find our project useful for your research and applications, please kindly cite using this BibTeX:
@article{lin2024pixwizard,
title={PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions},
author={Lin, Weifeng and Wei, Xinyu and Zhang, Renrui and Zhuo, Le and Zhao, Shitian and Huang, Siyuan and Xie, Junlin and Qiao, Yu and Gao, Peng and Li, Hongsheng},
journal={arXiv preprint arXiv:2409.15278},
year={2024}
}