Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace searchsorted with in-built function #35

Merged

Conversation

WillBrennan
Copy link
Contributor

Recent versions of pytorch add searchsorted; I had issues compiling your project with the versions of GCC and NVCC I have on my system. Looks like you were thinking along similar lines with your todo statement.

Because I wasn't able to compile the original torchsearchsorted module I haven't been able to check if the results are identical. However training on the low-res modules is converging and PSNR is going up in a reasonable way.

@WillBrennan WillBrennan force-pushed the feature/replace-searchsorted branch from 3d268ed to 49e1093 Compare April 21, 2021 21:55
@yenchenlin
Copy link
Owner

yenchenlin commented Apr 21, 2021

Hello @WillBrennan ,

I have merged commits previously on this issue. However, I am not sure whether this or other commits end up hurting the performance so I rolled back the code.

Can you let me know if you can get comparable results before I merge it?

Much thanks!

@salykova
Copy link

salykova commented Apr 21, 2021

Hi @WillBrennan
Please check the issue. Thats why the code was rolled back and thats why these requirements are used instead of torch>=1.8 torchvision>=0.9.1 Summary: with the new versions of torch and torchvision the provided pretrained models can not be rendered.

@WillBrennan WillBrennan force-pushed the feature/replace-searchsorted branch from 49e1093 to 434e781 Compare April 21, 2021 22:12
@WillBrennan
Copy link
Contributor Author

Hi everyone; wasn't aware of that previous issue with rendering. I'll keep you posted on the training and rendering results. It looks like there's been lots of other changes since then. If it works okay on my branch I'll run a git-bisect and try and pin down the bug as well.

@yenchenlin
Copy link
Owner

yenchenlin commented Apr 21, 2021

Hey @WillBrennan you are right, I am not sure exactly what breaks the performance. Please keep me posted and huge thanks for the efforts.

@WillBrennan
Copy link
Contributor Author

WillBrennan commented Apr 23, 2021

Sorry for the late reply! Model trained correctly on a single machine with a 2080Ti in just under ~8 hours using the minimum pytorch and torchvision versions in requirements txt on CUDA 10.2.

I've uploaded the model, output videos and console logs to a folder on google drive for you to inspect before merging in this PR.
https://drive.google.com/drive/folders/1lWQuF4ylr-vVFJAikRsle9-DofnFBCIE?usp=sharing

I'll start git-bisecting through the dev branch to work out whats going on here. I'd love to be able to have multi-gpu support and train a lot quicker!

@WillBrennan
Copy link
Contributor Author

Sorry! Just realised the drive folder with the results wasn't made public. Just updated that now.

@WillBrennan
Copy link
Contributor Author

** nudge **

@yenchenlin
Copy link
Owner

Hey sorry for the delay. The results look stunning, thanks so much for this!

@yenchenlin
Copy link
Owner

Do you mind if I spend some time training other models before merging?

@WillBrennan
Copy link
Contributor Author

No worries; of course not. Let me know if I can help; I've got spare GPUs waiting to heat-up a room.

@yenchenlin
Copy link
Owner

yenchenlin commented May 14, 2021 via email

@yenchenlin
Copy link
Owner

ping @WillBrennan

@WillBrennan
Copy link
Contributor Author

oops! sorry; forgot to say I'm training the LLFF scenes at the moment. Should be done by tomorrow.

@WillBrennan
Copy link
Contributor Author

Just uploaded the output in the logs directory to the directory I linked above;

https://drive.google.com/drive/folders/1uq0OSpyCuSIOBbT12L3pLiEnMoKIwMDo?usp=sharing

Looks like its all working correctly. This has;

  • fern
  • flowers
  • fortress
  • horns
  • leaves
  • orchids
  • room
  • trex

@salykova
Copy link

salykova commented Jun 21, 2021

Hi @WillBrennan! I have just tried to render using your pretrained models and got the error

Found ckpts ['./logs/fern_test/200000.tar']
Reloading from ./logs/fern_test/200000.tar
Traceback (most recent call last):

File "run_nerf.py", line 878, in <module>
train()
File "run_nerf.py", line 640, in train
render_kwargs_train, render_kwargs_test, start, grad_vars, optimizer = create_nerf(args)
File "run_nerf.py", line 231, in create_nerf
model.load_state_dict(ckpt['network_fn_state_dict'])
File "/home/mnsv/miniconda3/envs/nerf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for NeRF:
Missing key(s) in state_dict: "pts_linears.0.weight", "pts_linears.0.bias", "pts_linears.1.weight", "pts_linears.1.bias", "pts_linears.2.weight", "pts_linears.2.bias", "pts_linears.3.weight", "pts_linears.3.bias", "pts_linears.4.weight", "pts_linears.4.bias", "pts_linears.5.weight", "pts_linears.5.bias", "pts_linears.6.weight", "pts_linears.6.bias", "pts_linears.7.weight", "pts_linears.7.bias", "views_linears.0.weight", "views_linears.0.bias", "feature_linear.weight", "feature_linear.bias", "alpha_linear.weight", "alpha_linear.bias", "rgb_linear.weight", "rgb_linear.bias". Unexpected key(s) in state_dict: "module.pts_linears.0.weight", "module.pts_linears.0.bias", "module.pts_linears.1.weight", "module.pts_linears.1.bias", "module.pts_linears.2.weight", "module.pts_linears.2.bias", "module.pts_linears.3.weight", "module.pts_linears.3.bias", "module.pts_linears.4.weight", "module.pts_linears.4.bias", "module.pts_linears.5.weight", "module.pts_linears.5.bias", "module.pts_linears.6.weight", "module.pts_linears.6.bias", "module.pts_linears.7.weight", "module.pts_linears.7.bias", "module.views_linears.0.weight", "module.views_linears.0.bias", "module.feature_linear.weight", "module.feature_linear.bias", "module.alpha_linear.weight", "module.alpha_linear.bias", "module.rgb_linear.weight", "module.rgb_linear.bias".

May I ask you, did you test the models with built-in torchsearchsorted and with torch 1.8? Did you use the main or dev branch?

@WillBrennan
Copy link
Contributor Author

This wasn’t with the master or dev branch. It was with the branch for this PR to show it’s working as expected.

It looks like you’ve tried to load a single GPU model into a DataParallel object so I’m guessing you’re running off of the broken Dev branch that added multi-gpu support?

If you use the branch from this PR or master then it’ll load correctly

@yenchenlin yenchenlin merged commit f1e5b3f into yenchenlin:master Jun 21, 2021
@yenchenlin
Copy link
Owner

Yo sorry @salykovaa you wanna try again?

@yenchenlin
Copy link
Owner

@WillBrennan huge huge thanks <3

@WillBrennan WillBrennan deleted the feature/replace-searchsorted branch June 21, 2021 21:11
@WillBrennan
Copy link
Contributor Author

Anytime! It’s always great to see a project like this on GitHub!

@salykova
Copy link

@WillBrennan hmm interesting... I used master branch. but ok, I will try again tomorrow ;) may be I did something wrong

@salykova
Copy link

salykova commented Jun 22, 2021

Hi @WillBrennan! I have just finished testing your models. As I said, yesterday (and today) I used master branch and try to render, but got the same error posted yesterday. I don't know why the error appears. I tried both python 3.6, 3.7 and both pytorch 1.8, 1.9, but these models provided by you don't work. What I found interesting is that the rendering works perfectly with old models from 2020 provided by @yenchenlin here

@jason718
Copy link

jason718 commented Oct 6, 2021

Hi @WillBrennan! I have just finished testing your models. As I said, yesterday (and today) I used master branch and try to render, but got the same error posted yesterday. I don't know why the error appears. I tried both python 3.6, 3.7 and both pytorch 1.8, 1.9, but these models provided by you don't work. What I found interesting is that the rendering works perfectly with old models from 2020 provided by @yenchenlin here

same issue @WillBrennan

one solution, replace this line

model.load_state_dict(ckpt['network_fn_state_dict'])

with

        from collections import OrderedDict
        new_ckpt = OrderedDict()
        for k, v in ckpt['network_fn_state_dict'].items():
            if k.startswith('module.'):
                new_ckpt[k[7:]] = v # remove 'modeule.'
            else:
                new_ckpt[k] = v
        model.load_state_dict(new_ckpt)

@yenchenlin
Copy link
Owner

yenchenlin commented Oct 6, 2021

@salykovaa I've tested on my own machine and the issue is confirmed. I've changed the pre-trained model links back to my original google drive.

@WillBrennan seems that your saved weights have module. as prefix and that fails the model loading.

@jason718 thank you! This solution removes all the module. prefix and can help pytorch successfully load the model provided by @WillBrennan

SRDewan pushed a commit to SRDewan/nerf-pytorch that referenced this pull request Jul 19, 2022
…rchsorted

replace searchsorted with in-built function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants