Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLNet bias fix on resize embeddings (cf #1124) #1162

Merged
merged 2 commits into from
Sep 2, 2019
Merged

XLNet bias fix on resize embeddings (cf #1124) #1162

merged 2 commits into from
Sep 2, 2019

Conversation

LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented Aug 31, 2019

Fixed an issue where the linear layer bias wouldn't be resized along the weight resize when there was an embedding matrix resize with XLNet (cf #1124).

This fix works for any model that needs to tie its weights between an embedding layer & a linear layer if . that linear layer has a bias.

@@ -327,6 +327,14 @@ def _tie_or_clone_weights(self, first_module, second_module):
else:
first_module.weight = second_module.weight

if hasattr(first_module, 'bias') and first_module.bias is not None:
first_module.bias.data = torch.nn.functional.pad(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice and concise way to do it but I'm worried about two things here:

  • torch.nn.functional.pad is not present before pytorch 1.2.0 (and I think we should aim to keep compatibility with +1.0.1 if possible for now).
  • when we reduce the size of the embeddings (which is supported right now), this will break, I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe torch.nn.functional.pad was actually introduced in torch way back, and is available in the documentation of version 1.0.0. I've run this code with torch 1.0.0 installed successfully.

The pad function actually accepts negative indices, and will then remove the overflowing elements. In this scenario, it removes the last elements, similar to the resize_token_embeddings method.

Here's an example, running on torch 1.0.0:

from pytorch_transformers import XLNetTokenizer, XLNetLMHeadModel

model: XLNetLMHeadModel = XLNetLMHeadModel.from_pretrained("xlnet-base-cased")
tok = XLNetTokenizer.from_pretrained("xlnet-base-cased")
print(model.lm_loss.bias.shape, model.lm_loss.weight.shape)
# torch.Size([32000]) torch.Size([32000, 768])

tok.add_tokens(["token"])
model.resize_token_embeddings(len(tok))
print(model.lm_loss.bias.shape, model.lm_loss.weight.shape)
# torch.Size([32001]) torch.Size([32001, 768])

model.resize_token_embeddings(len(tok) - 100)
print(model.lm_loss.bias.shape, model.lm_loss.weight.shape)
# torch.Size([31901]) torch.Size([31901, 768])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damned, my doc search capabilities look quite rusty! Ok all good then :)

@thomwolf thomwolf merged commit 0287d26 into master Sep 2, 2019
@julien-c julien-c deleted the xlnet-bias branch December 18, 2019 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants