DepictQA: Depicted Image Quality Assessment with Vision Language Models

🌏 Project Page • 📀 Datasets ( huggingface / modelscope )

Official pytorch implementation of the papers:

DepictQA-Wild (DepictQA-v2): paper, project page.

Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue, "Descriptive Image Quality Assessment in the Wild," arXiv preprint arXiv:2405.18842, 2024.
DepictQA-v1: paper, project page.

Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong, "Depicting beyond scores: Advancing image quality assessment through multi-modal language models," ECCV, 2024.

Update

📆 [2024.07] DepictQA datasets were released in huggingface / modelscope.

📆 [2024.07] DepictQA-v1 was accepted to ECCV 2024.

📆 [2024.05] We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.

📆 [2023.12] We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.

Installation

Create environment.

# clone this repo
git clone https://github.com/XPixelGroup/DepictQA.git
cd DepictQA

# create environment
conda create -n depictqa python=3.10
conda activate depictqa
pip install -r requirements.txt

Download pretrained models.
- CLIP-ViT-L-14. Required.
- Vicuna-v1.5-7B. Required.
- All-MiniLM-L6-v2. Required only for confidence estimation of detailed reasoning responses.
- Our pretrained delta checkpoint (see Models). Optional for training. Required for demo and inference.
Ensure that all downloaded models are placed in the designated directories as follows.
```
|-- DepictQA
|-- ModelZoo
    |-- CLIP
        |-- clip
            |-- ViT-L-14.pt
    |-- LLM
        |-- vicuna
            |-- vicuna-7b-v1.5
    |-- SentenceTransformers
        |-- all-MiniLM-L6-v2
```
If models are stored in different directories, revise config.model.vision_encoder_path, config.model.llm_path, and config.model.sentence_model in config.yaml (under the experiments directory) to set new paths.
Move our pretrained delta checkpoint to a specific experiment directory (e.g., DQ495K, DQ495K_QPath) as follows.
```
|-- DepictQA
    |-- experiments
        |-- a_specific_experiment_directory
            |-- ckpt
                |-- ckpt.pt
```
If the delta checkpoint is stored in another directory, revise config.model.delta_path in config.yaml (under the experiments directory) to set new path.

Models

Training Data	Tune	Hugging Face	Description
DQ-495K + KonIQ + SPAQ	Abstractor, LORA	download	Vision abstractor to reduce token numbers. Trained on DQ-495K, KonIQ, and SPAQ datasets. Able to handle images with resolution larger than 1000+, and able to compare images with different contents.
DQ-495K + Q-Instruct	Projector, LORA,	download	Trained on DQ-495K and Q-Instruct (see paper) datasets. Able to complete multiple-choice, yes-or-no, what, how questions, but degrades in assessing and comparison tasks.
DQ-495K + Q-Pathway	Projector, LORA	download	Trained on DQ-495K and Q-Pathway (see paper) datasets. Performs well on real images, but degrades in comparison tasks.
DQ-495K	Projector, LORA	download	Trained on DQ-495K dataset. Used in our paper.

Demos

Online Demo

We provide an online demo (coming soon) deployed on huggingface spaces.

Gradio Demo

We provide a gradio demo for local test.

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
Launch controller: sh launch_controller.sh
Launch gradio server: sh launch_gradio.sh
Launch DepictQA worker: sh launch_worker.sh id_of_one_gpu

You can revise the server config in serve.yaml. The url of deployed demo will be http://{serve.gradio.host}:{serve.gradio.port}. The default url is http://0.0.0.0:12345 if you do not revise serve.yaml.

Note that multiple workers can be launched simultaneously. For each worker, serve.worker.host, serve.worker.port, serve.worker.worker_url, and serve.worker.model_name should be unique.

Datasets

Source codes for DQ-495K (used in DepictQA-v2) dataset construction are provided in here.
Download MBAPPS (used in DepictQA-v1) and DQ-495K (used in DepictQA-v2) datasets from huggingface / modelscope. Move the dataset to the same directory of this repository as follows.
```
|-- DataDepictQA
|-- DepictQA
```
If the dataset is stored in another directory, revise config.data.root_dir in config.yaml (under the experiments directory) to set new path.

Training

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14 and Vicuna-v1.5-7B are downloaded and (3) their paths are set in config.yaml.
Run training: sh train.sh ids_of_gpus.

Inference

Inference on Our Benchmark

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
Run a specific infer shell (e.g., infer_A_sd_brief.sh): sh infer_A_sd_brief.sh id_of_one_gpu.

Inference on Custom Dataset

Construct *.json file for your dataset as follows.

[
    {
        "id": unique id of each sample, required, 
        "image_ref": reference image, null if not applicable, 
        "image_A": image A, null if not applicable, 
        "image_B": image B, null if not applicable, 
        "query": input question, required, 
    }, 
    ...
]

cd your experiment directory: cd your_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

Construct your infer shell as follows.

#!/bin/bash
src_dir=directory_of_src
export PYTHONPATH=$src_dir:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=$1

python $src_dir/infer.py \
    --meta_path json_path_of_your_dataset \
    --dataset_name your_dataset_name \
    --task_name task_name \
    --batch_size batch_size \

--task_name can be set as follows.

Task Name	Description
quality_compare	AB comparison in full-reference
quality_compare_noref	AB comparison in non-reference
quality_single_A	Image A assessment in full-reference
quality_single_A_noref	Image A assessment in non-reference
quality_single_B	Image B assessment in full-reference
quality_single_B_noref	Image B assessment in non-reference

Run your infer shell : sh your_infer_shell.sh id_of_one_gpu.

Evaluation

cd the evaluation directory: cd src/eval.

Various evaluation scripts are explained as follows.

Script	Description
`cal_acc_single_distortion.py`	accuracy of single-distortion identification
`cal_acc_multi_distortion.py`	accuracy of multi-distortion identification
`cal_acc_rating.py`	accuracy of instant rating
`cal_gpt4_score_detail_v1.py`	GPT-4 score of detailed reasoning tasks in DepictQA-v1. Treat both prediction and ground truth as assistants, calculate the relative score of prediction over ground truth.
`cal_gpt4_score_detail_v2.py`	GPT-4 score of detailed reasoning tasks in DepictQA-v2. Only treat prediction as an assistant, directly assess the consistency between prediction and ground truth.

Run basic evaluation (e.g., cal_acc_single_distortion.py):
```
python cal_acc_single_distortion.py --pred_path predict_json_path --gt_path ground_truth_json_path
```
Some specific parameters are explained as follows.

For the calculation of accuracy:
- --confidence (store_true): whether to calculate accuracy within various confidence intervals.
- --intervals (list of float, default [0, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1]): the confidence intervals, only valid when --confidence is true.
For the calculation of GPT-4 score:
- --save_path (str, required): *.json path to save the evaluation results including scores and reasons.

Acknowledgement

This repository is based on LAMM. Thanks for this awesome work.

BibTeX

If you find our work useful for your research and applications, please cite using the BibTeX:

@article{depictqa_v2,
    title={Descriptive Image Quality Assessment in the Wild},
    author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={arXiv preprint arXiv:2405.18842},
    year={2024}
}


@article{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    journal={arXiv preprint arXiv:2312.08962},
    year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
build_datasets		build_datasets
docs		docs
experiments		experiments
src		src
tests/model		tests/model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DepictQA: Depicted Image Quality Assessment with Vision Language Models

Update

Installation

Models

Demos

Online Demo

Gradio Demo

Datasets

Training

Inference

Inference on Our Benchmark

Inference on Custom Dataset

Evaluation

Acknowledgement

BibTeX

About

Releases

Packages

Languages

License

XPixelGroup/DepictQA

Folders and files

Latest commit

History

Repository files navigation

DepictQA: Depicted Image Quality Assessment with Vision Language Models

Update

Installation

Models

Demos

Online Demo

Gradio Demo

Datasets

Training

Inference

Inference on Our Benchmark

Inference on Custom Dataset

Evaluation

Acknowledgement

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages