Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update models weights for inception_v3, vgg11, and vgg13 #3851

Merged
merged 2 commits into from
May 17, 2021

Conversation

NicolasHug
Copy link
Member

This PR udpates the URls for the above models. The weights have been uploaded to the S3 bucket and the internal facebook DB.

Closes #3767
Closes #2473

The following snippet now passes:

for model_name in ('inception_v3', 'vgg11', 'vgg13'):
    getattr(models, model_name)(pretrained=True)


root = '/data/home/nicolashug/.cache/torch/hub/checkpoints/'
for filename in ('inception_v3_google-0cc3c7bd.pth', 'vgg11-8a719046.pth', 'vgg13-19584684.pth'):
    for device in ('cpu', 'cuda', 'cuda:0', 'cuda:1'):
        torch.load(root + filename, map_location=device)

The new weights have been generated and tested like so:

import torch
import torchvision
import torchvision.transforms as t
from torchvision import models
import torchvision.ops as ops

def get_available_classification_models():
    return [k for k, v in models.__dict__.items() if callable(v) and k[0].lower() == k[0] and k[0] != "_"]

all_classification_models = get_available_classification_models()
print(all_classification_models)


for model_name in all_classification_models:
    if model_name in ('mnasnet0_75', 'mnasnet1_3', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0'):
        continue
    getattr(models, model_name)(pretrained=True)




from glob import glob
import io
from pickle import UnpicklingError

all_files = glob('/data/home/nicolashug/.cache/torch/hub/checkpoints/*.pth')
baddies = []
goodies = []

for filename in all_files:
    try:
        state_dict = torch.load(filename, map_location='cuda')
    except RuntimeError:
        baddies.append(filename)
    else:
        goodies.append(filename)

print("these files are OK:")
print(goodies)
print("these ones failed:")
print(baddies)


import hashlib
import os
import uuid

output_dir = '/data/home/nicolashug/new_models'
tmp_path = os.path.join(output_dir, 'blah')

for filename in baddies:
    model_name = '-'.join(filename.split('/')[-1].split('-')[:-1])
    model = torch.load(filename)
    torch.save(model, tmp_path)

    sha256_hash = hashlib.sha256()
    with open(tmp_path, "rb") as f:
        # Read and update hash string value in blocks of 4K
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
        hh = sha256_hash.hexdigest()


    output_path = os.path.join(output_dir, model_name + "-" + str(hh[:8]) + ".pth")
    os.replace(tmp_path, output_path)


# Make sure they can be loaded now
for filename in glob(output_dir + '/*'):
    state_dict = torch.load(filename, map_location='cuda')
    
print("The newly updated weights don't fail anymore")
print(f"They are located in {output_dir}")

# Also make sure the new dicts are the same as the old ones
for d1, d2 in zip(sorted(glob(output_dir + '/*')), sorted(baddies)):
    d1 = torch.load(d1)
    d2 = torch.load(d2)
    assert d1.keys() == d2.keys()
    for v1, v2 in zip(d1.values(), d2.values()):
        torch.testing.assert_allclose(v1, v2)

print("newly updated weights seem to be the same as the previous ones.")
print("all good")

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Since the pre-trained weights are not covered by any unit-test (to avoid flakiness from downloading data), I strongly recommend to test that the reported accuracies before and after the change remain the same by running:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --model inception_v3 --test-only --pretrained

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --model vgg11 --test-only --pretrained

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --model vgg13 --test-only --pretrained

@NicolasHug
Copy link
Member Author

@datumbox thanks for the tips, what is this train.py file that you're referring to?

In terms of consistency with the previous weights, I made sure to check the following (in the "details" section above):

# Also make sure the new dicts are the same as the old ones
for d1, d2 in zip(sorted(glob(output_dir + '/*')), sorted(baddies)):
    d1 = torch.load(d1)
    d2 = torch.load(d2)
    assert d1.keys() == d2.keys()
    for v1, v2 in zip(d1.values(), d2.values()):
        torch.testing.assert_allclose(v1, v2)

@NicolasHug NicolasHug merged commit 541e0f1 into pytorch:master May 17, 2021
@NicolasHug
Copy link
Member Author

NicolasHug commented May 17, 2021

I was able to get the same accuracy results on master and on this branch with the above internal script, thanks for the refs

facebook-github-bot pushed a commit that referenced this pull request May 19, 2021
)

Reviewed By: cpuhrsch

Differential Revision: D28538761

fbshipit-source-id: f54b3534daf5c6f4c1ad9e346415553991dd6f9c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to load the pre-trained weights on multi-gpus. Unable to load VGG model's state dict on GPU
3 participants