wukong

Wukong Dataset

This project provides the zero-shot classification task on ILSVRC dataset using multi-modality large-scale model pretrained on Noah-Wukong dataset. Model structure as follows:

Models	Embedding dimension	Image encoder	similarity	# vis_token	checkpoints
Wukong_ViT-B^G	512	Vit-b/32	Global	/	download
Wukong_ViT-B^F	512	Vit-b/32	Token-wise	/	download
Wukong_ViT-B	512	Vit-b/32	Token-wise	12	download
Wukong_ViT-L^G	768	Vit-L/14	Global	/	download
Wukong_ViT-L ^F	768	Vit-L/14	Token-wise	/	download
Wukong_ViT-L	768	Vit-L/14	Token-wise	24	download

More benchmark of the multi-modality modal please refer to Noah-Wukong Benchmark

Environment requirements

Hardware
- Ascend processor
Framework
- Mindspore
Tutorial
- Mindspore Tutorial
- Mindspore Python API

Quick Start

Prepare Dataset

Download ILSVRC dataset and organize the file as follows:

.
└── data_root
     ├── class1
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── class2
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── class3
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── classN
     ├── ...

Download corresponding Chinese class name file imagenet_class_name_zh.json and place it the same folder with eval.py .

Prepare files required for tokenizer

Download following files and place them under src/tools/

English: bpe_simple_vocab_16e6.txt.gz
Chinese: vocab_zh.txt

Propare prompt files

Download prompt filezh_templates.txtto src/tools/.This file defines the prompts used in zero-shot classification task. The number of prompts can be modified according to time/performance balance. Custom prompts are also allowed.

Prepare pretrained model checkpoint

Download corresponding pretrained checkpoint files following links in the table.

Zero-shot Classification

Run eval.py to do zero-shot classification, each model has its config file under src/config/ folder.

python eval.py --config_path [config_path] --ckpt_path [ckpt_path] --dataset_path [/path/to/data_root] --batch_size [batch size]

evaluation result is something like this

INFO:main:correct @1: 51.51; correct @5: 78.33

Detailed zero-shot classification performance is as below:

	single@1	single@5	embed(80)@1	embed(80)@5
ViT-B-G	44.68	71.19	47.32	74.3
ViT-B-F	32.53	57.51	37.17	63.22
ViT-B	45.22	70.69	48.24	73.43
ViT-L-G	56.15	79.86	57.54	81.46
ViT-L-F	49.74	76.3	52.83	78.88
ViT-L	50.22	74.79	54.43	80.1

Quick Start on Wukong Dataset

Download Wukong Dataset annotation files

Wukong 100m dataset files can be downloaed from Wukong, file structure should be like this:

.
└── data_root
    └─wukong_release
        ├─ wukong_100m_0.csv
        ├─ wukong_100m_1.csv
        ├─ wukong_100m_2.csv
        ├─ ....
        └─ wukong_100m_255.csv

Download images

We provide a multi-threaded python script for downloading the images through annotation files.

cd models/research/mm/wukong/src/dataset/
python wukong_download.py --csv_dir /path/to/data_root/wukong_release/ --img_dir IMG_DIR [--start_id 0] [--end_id -1] [--thread_num 4]

where IMG_DIR refer to the downloaded image dir, option start_id and end_id defines the start and end id for csv files to be downloaded, thread_num defines the number of threads used for parallel downloading. If not provided, default setting will download images in all csv files. Each csv file corresponds to a subdir under IMG_DIR and the final structure is like this:

.
└── IMG_DIR
    ├─000
    │   ├─ 00000.jpg
    │   ├─ 00001.jpg
    │   ├─ 00002.jpg
    │   └─ ......
    ├─001
    ├─002
    ├─...

Generate MindRecord

In order to be used in Mindspore, we convert the raw data into MindRecord format. To do this, run code

cd models/research/mm/wukong/
python -m src.dataset.generate_dataset --csv_dir /path/to/data_root/wukong_release/ --img_dir IMG_DIR --data_record_dir DATA_RECORD_DIR [--shard_num 10] [--worker_num 4] [--block_size 2000]

Here DATA_RECORD_DIR refer to the path where mindrecord files will be generated into; shared_num refer to the number of files mindrecord is splited; worker_num refer to the number of workers to convert mindrecord and block size defines the block size of each write. After execution the mindrecord files should be like this

└─DATA_RECORD_DIR
        ├─ wukong100m.mindrecord0
        ├─ wukong100m.mindrecord0.db
        ├─ ....
        ├─ wukong100m.mindrecord9
        └─ wukong100m.mindrecord9.db

Then you can load the dataset in a standard format like get_wukong_dataset function in models/research/mm/wukong/src/dataset/dataset.py.

Name		Name	Last commit message	Last commit date
parent directory ..
src		src
README.md		README.md
README_CN.md		README_CN.md
eval.py		eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wukong

wukong

README.md

Contents

Wukong Dataset

Environment requirements

Quick Start

Prepare Dataset

Prepare files required for tokenizer

Propare prompt files

Prepare pretrained model checkpoint

Zero-shot Classification

Quick Start on Wukong Dataset

Download Wukong Dataset annotation files

Download images

Generate MindRecord

Files

wukong

Directory actions

More options

Directory actions

More options

Latest commit

History

wukong

Folders and files

parent directory

README.md

Contents

Wukong Dataset

Environment requirements

Quick Start

Prepare Dataset

Prepare files required for tokenizer

Propare prompt files

Prepare pretrained model checkpoint

Zero-shot Classification

Quick Start on Wukong Dataset

Download Wukong Dataset annotation files

Download images

Generate MindRecord