*equal contribution
With the recent advances in NeRF-based 3D aware GANs quality, projecting an image into the latent space of these 3D-aware GANs has a natural advantage over 2D GAN inversion: not only does it allow multi-view consistent editing of the projected image, but it also enables 3D reconstruction and novel view synthesis when given only a single image. However, the explicit viewpoint control acts as a main hindrance in the 3D GAN inversion process, as both camera pose and latent code have to be optimized simultaneously to reconstruct the given image. Most works that explore the latent space of the 3D-aware GANs rely on ground-truth camera viewpoint or deformable 3D model, thus limiting their applicability. In this work, we introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing. The key to our approach is to leverage pre-trained estimators for better initialization and utilize the pixel-wise depth calculated from NeRF parameters to better reconstruct the given image. We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing to demonstrate the advantages of utilizing the latent space of 3D GANs.
For more information, check out the paper on Arxiv or Project page
NVIDIA GPUs. We have done all testings on RTX 3090 GPU.
64-bit Python 3.9, PyTorch 1.11.0 + CUDA toolkit 11.3
conda env create -f environment.yml
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
conda activate 3dganinv
Download pre-trained weights on this google drive Links
Put weight of initializers and generators as followings:
└── root
└── initializer
└── pose_estimator.pt
└── pose_estimator_quat.pt
└── pose_estimator_afhq.pt
└── e4e_ffhq.pt
└── e4e_afhq.pt
└── pretrained_models
└── afhqcats512-128.pkl
└── ffhqrebalanced512-128.pkl
We refer the users to the preprocessing code from the EG3D representation
We also provide an easy-to-use image alignment notebook at
In addition, we manually cropped the facial areas for inverting images of cats.
Run inversion process
python scripts/run_pti.py
You can edit the input & output directories, or GPU number on configs/paths_config.py
EG3D model and implementation:
https://github.com/NVlabs/eg3d
Copyright (c) 2021-2022, NVIDIA Corporation & affiliates.
License (NVIDIA) https://github.com/NVlabs/eg3d/blob/main/LICENSE.txt
PTI implementation:
https://github.com/danielroich/PTI
Copyright (c) 2021 Daniel Roich
License (MIT) https://github.com/danielroich/PTI/blob/main/LICENSE
GANSPACE implementation:
https://github.com/harskish/ganspace
Copyright (c) 2020 harkish
License (Apache License 2.0) https://github.com/harskish/ganspace/blob/master/LICENSE
This code implementation is heavily borrowed from the official implementation of EG3D and PTI. We really appreciate for all the projects.
@article{ko20233d,
author = {Ko, Jaehoon and Cho, Kyusun and Choi, Daewon and Ryoo, Kwangrok and Kim, Seungryong},
title = {3D GAN Inversion with Pose Optimization},
journal = {WACV},
year = {2023},
}