Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation

🔥 Updates

[2024.12.11] We have released the weight file of Word2Scene, please follow the steps below to get it: 2.1. Instructions for Obtaining Model Weights
[2024.11.03] We have released the evaluation script and pre-trained model for GSHPS, feel free to try it out.

⭐ Visualization

GUI.english.version.mp4

2184339932-commerciallorageographicscenegenerationallv100000101.mp4

Table of content

1.Preparation

Install required packages: pip install -r requirements.txt

2. Using Word2scene to Generate Remote Sensing Images

To clone the Stable Diffusion web UI to your local machine, use the following command:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui -b v1.6.1

In webui-user.bat, add the following parameters to COMMANDLINE_ARGS:

set COMMANDLINE_ARGS=--enable-console-prompts --api --api-log

Download the stable diffusion v1.5 pre-trained model (v1-5-pruned.ckpt) locally and place it in the models\Stable-diffusion folder.

Note: After downloading the model, you should check if the downloaded model is complete. The MD5 value of v1-5-pruned.ckpt is fde08ee6f4fac7ab26592bf519cbb405.

Then, download the Word2Scene model we provided (please refer to 2.1. Instructions for Obtaining Model Weights) and place it in the models\Lora folder.

Launch the Stable Diffusion web UI and set the Stable Diffusion checkpoint to v1-5-pruned.ckpt:

Run the Word2Scene GUI:

python word2scene_gui.py

In Word2Scene GUI, input any textual description or scene concept for the remote sensing imagery you want to generate.

Click the Generate button to create the corresponding remote sensing image:

2.1. Instructions for Obtaining Model Weights

This is an ongoing project. In order to track and manage the model, and to better support and improve this project, we are not publicly releasing the model weight files directly at this time. If you need to use Word2Scene, please apply to obtain it by following the steps below:

1. Fill out the application email: Please fill in your relevant information according to the application form template below.

2. Send the application email: Send the completed application form to our contact email: jaycecd@foxmail.com.

3. Obtain the download password: After receiving your application email, we will review your information as soon as possible and inform you of the cloud storage verification code via a reply email, allowing you to download the model weights.

4. Application Form Template

Please include the following in your email:

Name: [Your full name]

Institution: [Your institution or organization name]

Position: [Your position or role, e.g., PhD student, researcher, etc.]

Research Field: [Briefly describe your research field]

Intended Use: [Briefly explain the intended use of the model weights]

Contact Email: [Your email address]

I hereby commit to using the model weights solely for scientific research purposes, not for any commercial use, and to comply with the relevant provisions of the open-source license. Furthermore, I understand that any problems or losses arising during the use of the model are my own responsibility, and the authors are exempt from liability (after all, if the model behaves mischievously, even the authors can't control it).

3. Evaluate Generated Images with GSHPS

We have released the GSHPS (Geographic Scene Holistic Perceptual Similarity) evaluation script and model for assessing the quality of generated images. You can use our script to evaluate the quality of your generated images.

3.1. Preparation

Ensure that you have installed the dependencies specified in requirements.txt.
Download the pre-trained classification model for GSHPS evaluation:
- Pre-trained classification model (gshps_vgg16.pt, code: lz9p)

Note: The classification model includes 11 scene categories mentioned in the paper: heterogeneous scene assembly (HeSA), which encompasses church, commercial, dense residential, industrial, medium residential, park, railway station, resort, school, sparse residential, and square, totaling 11 categories. Surprisingly, even if your category is not among these 11 categories, GSHPS can roughly assess the similarity of the unknown category image, though it cannot determine the specific class of the image under evaluation. However, if your sample falls outside these 11 categories and requires a more precise evaluation, retraining a scene classification model may be necessary. Fortunately, classification models are currently quite straightforward to train.

3.2. Usage

The GSHPS evaluation script computes a similarity metric by comparing your generated images with reference images. You can use our pre-trained classification model or train your own classification model to replace ours.

To compute the GSHPS metric, run the script calculate_gshps.py with the following arguments:

--ref: Path to the reference images folder.
--gen: Path to the generated images folder.
--batch_size: (Optional) Batch size for data loading. Default is 32.
--classification_model: (Optional) Path to the classification model.

3.2.1. Using the Pre-trained Classification Model

Prepare image data:

Place your reference images in a folder, e.g., path/to/reference_images/.
Place your generated images in another folder, e.g., path/to/generated_images/.
Ensure that the image filenames in both folders are the same to allow correct matching.

Run the GSHPS evaluation script:

python calculate_gshps.py --ref path/to/reference_images/ --gen path/to/generated_images/

The script will output the GSHPS score

3.2.2. Training Your Own Classification Model

If you wish to use your own classification model:

Train your model:

Train a classification model suitable for your dataset.

Run the GSHPS evaluation script:

Run the script as before, specifying the path to your model:

python calculate_gshps.py --ref path/to/reference_images/ --gen path/to/generated_images/ --classification_model path/to/your_classification_model.pt

3.3. Notes

Image filename matching: The reference and generated images should have the same filenames to ensure correct matching and GSHPS computation.
Image formats: Ensure that the images in both folders are in the same format and contain only image files.
Model compatibility: The classification model should be compatible with the feature extraction part in the script. By default, the script uses features from VGG16.
Folder structure: The script assumes that the reference and generated images are located in flat directories without subfolders. If your images are organized differently, you may need to adjust the script accordingly.

4.Examples

Word2Scene: Visualization of remote sensing scenes generated from text descriptions in the test set using different text-to-image generation methods. Zoom-in for better details.
Word2Scene-C: Directly generate scenes using scene concepts from the training set. The real image is randomly selected from the images in the corresponding scene concept. Zoom-in for better details.
Word2Scene Zero-shot: Directly generate scenes using scene concepts outside the training set (zero-shot). Zoom-in for better details.
Diversity and stability: These samples were all randomly generated 9 times using the same description on different methods.
Comparison of the models obtained at different epochs with the results generated at different lora strengths.

5.Interpretability of Word2Scene

How does textual input specifically influence the outcome of scene generation?

Using heatmaps to explain how text influences scene generation.

6.Paper

Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation

If you find it useful, we kindly request that you leave a star on our GitHub repository.

Should you incorporate Word2Scene into your research, please cite the Word2Scene article:

@article{REN2024231,
title = {Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
author = {Jiaxin Ren and Wanzeng Liu and Jun Chen and Shunxi Yin and Yuan Tao},
volume = {218},
pages = {231-257},
year = {2024},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2024.11.002},
url = {https://www.sciencedirect.com/science/article/pii/S0924271624004106}
}

7.Acknowledgement

LoRA. This repo contains the source code of the Python package loralib and several examples of how to integrate it with PyTorch models.
BLIP. A model that unifies the framework for visual-language pre-training and fine-tuning, enabling multimodal learning and cross-modal understanding.
Stable Diffusion. Stable Diffusion is a latent text-to-image diffusion model. We use it to generate the text-to-image samples.
Stable Diffusion web UI. A web interface for Stable Diffusion, implemented using Gradio library. We use it to load the generated LoRA model and generate remote sensing scenes.

8.License

This repo is distributed under GPL License. The code can be used for academic purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calculate_gshps.py		calculate_gshps.py
requirements.txt		requirements.txt
word2scene_gui.py		word2scene_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation

🔥 Updates

⭐ Visualization

Table of content

1.Preparation

2. Using Word2scene to Generate Remote Sensing Images

2.1. Instructions for Obtaining Model Weights

3. Evaluate Generated Images with GSHPS

3.1. Preparation

3.2. Usage

3.2.1. Using the Pre-trained Classification Model

3.2.2. Training Your Own Classification Model

3.3. Notes

4.Examples

5.Interpretability of Word2Scene

6.Paper

7.Acknowledgement

8.License

About

Releases

Packages

Languages

License

jaycecd/Word2Scene

Folders and files

Latest commit

History

Repository files navigation

Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation

🔥 Updates

⭐ Visualization

Table of content

1.Preparation

2. Using Word2scene to Generate Remote Sensing Images

2.1. Instructions for Obtaining Model Weights

3. Evaluate Generated Images with GSHPS

3.1. Preparation

3.2. Usage

3.2.1. Using the Pre-trained Classification Model

3.2.2. Training Your Own Classification Model

3.3. Notes

4.Examples

5.Interpretability of Word2Scene

6.Paper

7.Acknowledgement

8.License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages