Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New features] add roberta & gpt conversion #4407

Merged
merged 7 commits into from
Jan 11, 2023

Conversation

wj-Mcat
Copy link
Contributor

@wj-Mcat wj-Mcat commented Jan 10, 2023

PR types

New features

PR changes

Models

Description

添加 roberta 和 gpt 的在线转化带阿米

@paddle-bot
Copy link

paddle-bot bot commented Jan 10, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Merging #4407 (8f586c2) into develop (ed1f5ac) will increase coverage by 0.47%.
The diff coverage is 77.27%.

@@             Coverage Diff             @@
##           develop    #4407      +/-   ##
===========================================
+ Coverage    39.62%   40.10%   +0.47%     
===========================================
  Files          433      439       +6     
  Lines        60982    61568     +586     
===========================================
+ Hits         24165    24689     +524     
- Misses       36817    36879      +62     
Impacted Files Coverage Δ
paddlenlp/transformers/conversion_utils.py 29.79% <54.54%> (+6.32%) ⬆️
paddlenlp/transformers/gpt/configuration.py 100.00% <100.00%> (ø)
paddlenlp/transformers/gpt/modeling.py 78.21% <100.00%> (+0.79%) ⬆️
paddlenlp/transformers/roberta/modeling.py 90.18% <100.00%> (+0.32%) ⬆️
paddlenlp/utils/serialization.py 88.28% <100.00%> (+67.18%) ⬆️
paddlenlp/transformers/chineseclip/modeling.py 85.15% <0.00%> (-0.36%) ⬇️
paddlenlp/transformers/__init__.py 100.00% <0.00%> (ø)
paddlenlp/transformers/auto/modeling.py 71.88% <0.00%> (ø)
paddlenlp/transformers/auto/tokenizer.py 81.74% <0.00%> (ø)
paddlenlp/transformers/cmsim_lock/tokenizer.py 100.00% <0.00%> (ø)
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@wj-Mcat wj-Mcat marked this pull request as ready for review January 11, 2023 05:11
Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一些小问题

tests/transformers/gpt/test_modeling.py Outdated Show resolved Hide resolved
paddlenlp/transformers/roberta/modeling.py Outdated Show resolved Hide resolved
Comment on lines +589 to +594
if name_mapping.target_name in paddle_state_dict:
paddle_numpy = paddle_state_dict.pop(name_mapping.target_name)
model_state_saver.add(name_mapping.target_name, "paddle", paddle_numpy)
model_state_saver.add(name_mapping.target_name, "paddle-shape", str(paddle_numpy.shape))

pytorch_numpy = pytorch_state_dict.pop(name_mapping.source_name)
model_state_saver.add(name_mapping.target_name, "pytorch", pytorch_numpy)
model_state_saver.add(name_mapping.target_name, "pytorch-shape", str(pytorch_numpy.shape))
if name_mapping.source_name in pytorch_state_dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个if 分开会不会有不对称的情况,即一个true 一个false. 可以考虑合成一个 if name_mapping.source_name in pytorch_state_dict and name_mapping.target_name in paddle_state_dict:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个确实是会存在:比如pytorch 那边把 attention 的 qkv 存在一次,此时只有一个weight,可是 paddle 这边还是会存成三个 tensor。

这段代码的作用将会变成:pytorch 会 hook attention(qkv 的上层layer)的 forward,而 paddle 会分别 hook attention 中 q、k、v 的 forward。

这也是在 gpt 中测试出来的,为了做hooker 兼容。虽然不能从 layer-name 层面一一对应,可是也是能够给出一些logit 信息,提供判断。

Copy link
Contributor Author

@wj-Mcat wj-Mcat Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在 LogitComparer 中对应 Attention的输出是:

+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.q_proj.weight   | [ 0.50337297 -0.7915838   0.5165813 ] | [0.03290959 0.0143059  0.11999971]             | [ 0.50337297 -0.7915838   0.5165813 ] | [0.03290959 0.01430589 0.11999971]             |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.k_proj.weight   | [ 0.50337297 -0.7915838   0.5165813 ] | [-0.10358325  0.05546366  0.13997011]          |                                       |                                                |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.v_proj.weight   | [ 0.50337297 -0.7915838   0.5165813 ] | [ 0.24479835  0.00965955 -0.06201734]          |                                       |                                                |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sijunhe sijunhe merged commit b2a24c6 into PaddlePaddle:develop Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants