🌏 Project Page • 📀 Datasets ( huggingface / modelscope )
Official pytorch implementation of the papers:
-
DepictQA-Wild (DepictQA-v2): paper, project page.
Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue, "Descriptive Image Quality Assessment in the Wild," arXiv preprint arXiv:2405.18842, 2024.
-
DepictQA-v1: paper, project page.
Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong, "Depicting beyond scores: Advancing image quality assessment through multi-modal language models," ECCV, 2024.
📆 [2024.07] DepictQA datasets were released in huggingface / modelscope.
📆 [2024.07] DepictQA-v1 was accepted to ECCV 2024.
📆 [2024.05] We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.
📆 [2023.12] We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.
-
Create environment.
# clone this repo git clone https://github.com/XPixelGroup/DepictQA.git cd DepictQA # create environment conda create -n depictqa python=3.10 conda activate depictqa pip install -r requirements.txt
-
Download pretrained models.
- CLIP-ViT-L-14. Required.
- Vicuna-v1.5-7B. Required.
- All-MiniLM-L6-v2. Required only for confidence estimation of detailed reasoning responses.
- Our pretrained delta checkpoint (see Models). Optional for training. Required for demo and inference.
-
Ensure that all downloaded models are placed in the designated directories as follows.
|-- DepictQA |-- ModelZoo |-- CLIP |-- clip |-- ViT-L-14.pt |-- LLM |-- vicuna |-- vicuna-7b-v1.5 |-- SentenceTransformers |-- all-MiniLM-L6-v2
If models are stored in different directories, revise config.model.vision_encoder_path, config.model.llm_path, and config.model.sentence_model in config.yaml (under the experiments directory) to set new paths.
-
Move our pretrained delta checkpoint to a specific experiment directory (e.g., DQ495K, DQ495K_QPath) as follows.
|-- DepictQA |-- experiments |-- a_specific_experiment_directory |-- ckpt |-- ckpt.pt
If the delta checkpoint is stored in another directory, revise config.model.delta_path in config.yaml (under the experiments directory) to set new path.
Training Data | Tune | Hugging Face | Description |
---|---|---|---|
DQ-495K + KonIQ + SPAQ | Abstractor, LORA | download | Vision abstractor to reduce token numbers. Trained on DQ-495K, KonIQ, and SPAQ datasets. Able to handle images with resolution larger than 1000+, and able to compare images with different contents. |
DQ-495K + Q-Instruct | Projector, LORA, | download | Trained on DQ-495K and Q-Instruct (see paper) datasets. Able to complete multiple-choice, yes-or-no, what, how questions, but degrades in assessing and comparison tasks. |
DQ-495K + Q-Pathway | Projector, LORA | download | Trained on DQ-495K and Q-Pathway (see paper) datasets. Performs well on real images, but degrades in comparison tasks. |
DQ-495K | Projector, LORA | download | Trained on DQ-495K dataset. Used in our paper. |
We provide an online demo (coming soon) deployed on huggingface spaces.
We provide a gradio demo for local test.
-
cd a specific experiment directory:
cd experiments/a_specific_experiment_directory
-
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
-
Launch controller:
sh launch_controller.sh
-
Launch gradio server:
sh launch_gradio.sh
-
Launch DepictQA worker:
sh launch_worker.sh id_of_one_gpu
You can revise the server config in serve.yaml. The url of deployed demo will be http://{serve.gradio.host}:{serve.gradio.port}. The default url is http://0.0.0.0:12345 if you do not revise serve.yaml.
Note that multiple workers can be launched simultaneously. For each worker, serve.worker.host, serve.worker.port, serve.worker.worker_url, and serve.worker.model_name should be unique.
-
Source codes for DQ-495K (used in DepictQA-v2) dataset construction are provided in here.
-
Download MBAPPS (used in DepictQA-v1) and DQ-495K (used in DepictQA-v2) datasets from huggingface / modelscope. Move the dataset to the same directory of this repository as follows.
|-- DataDepictQA |-- DepictQA
If the dataset is stored in another directory, revise config.data.root_dir in config.yaml (under the experiments directory) to set new path.
-
cd a specific experiment directory:
cd experiments/a_specific_experiment_directory
-
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14 and Vicuna-v1.5-7B are downloaded and (3) their paths are set in config.yaml.
-
Run training:
sh train.sh ids_of_gpus
.
-
cd a specific experiment directory:
cd experiments/a_specific_experiment_directory
-
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
-
Run a specific infer shell (e.g., infer_A_sd_brief.sh):
sh infer_A_sd_brief.sh id_of_one_gpu
.
-
Construct *.json file for your dataset as follows.
[ { "id": unique id of each sample, required, "image_ref": reference image, null if not applicable, "image_A": image A, null if not applicable, "image_B": image B, null if not applicable, "query": input question, required, }, ... ]
-
cd your experiment directory:
cd your_experiment_directory
-
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
-
Construct your infer shell as follows.
#!/bin/bash src_dir=directory_of_src export PYTHONPATH=$src_dir:$PYTHONPATH export CUDA_VISIBLE_DEVICES=$1 python $src_dir/infer.py \ --meta_path json_path_of_your_dataset \ --dataset_name your_dataset_name \ --task_name task_name \ --batch_size batch_size \
--task_name
can be set as follows.Task Name Description quality_compare AB comparison in full-reference quality_compare_noref AB comparison in non-reference quality_single_A Image A assessment in full-reference quality_single_A_noref Image A assessment in non-reference quality_single_B Image B assessment in full-reference quality_single_B_noref Image B assessment in non-reference -
Run your infer shell :
sh your_infer_shell.sh id_of_one_gpu
.
-
cd the evaluation directory:
cd src/eval
. -
Various evaluation scripts are explained as follows.
Script Description cal_acc_single_distortion.py
accuracy of single-distortion identification cal_acc_multi_distortion.py
accuracy of multi-distortion identification cal_acc_rating.py
accuracy of instant rating cal_gpt4_score_detail_v1.py
GPT-4 score of detailed reasoning tasks in DepictQA-v1. Treat both prediction and ground truth as assistants, calculate the relative score of prediction over ground truth. cal_gpt4_score_detail_v2.py
GPT-4 score of detailed reasoning tasks in DepictQA-v2. Only treat prediction as an assistant, directly assess the consistency between prediction and ground truth. -
Run basic evaluation (e.g., cal_acc_single_distortion.py):
python cal_acc_single_distortion.py --pred_path predict_json_path --gt_path ground_truth_json_path
Some specific parameters are explained as follows.
For the calculation of accuracy:
--confidence
(store_true): whether to calculate accuracy within various confidence intervals.--intervals
(list of float, default [0, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1]): the confidence intervals, only valid when--confidence
is true.
For the calculation of GPT-4 score:
--save_path
(str, required): *.json path to save the evaluation results including scores and reasons.
This repository is based on LAMM. Thanks for this awesome work.
If you find our work useful for your research and applications, please cite using the BibTeX:
@article{depictqa_v2,
title={Descriptive Image Quality Assessment in the Wild},
author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
journal={arXiv preprint arXiv:2405.18842},
year={2024}
}
@article{depictqa_v1,
title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
journal={arXiv preprint arXiv:2312.08962},
year={2023}
}