-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New features] add roberta & gpt conversion #4407
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #4407 +/- ##
===========================================
+ Coverage 39.62% 40.10% +0.47%
===========================================
Files 433 439 +6
Lines 60982 61568 +586
===========================================
+ Hits 24165 24689 +524
- Misses 36817 36879 +62
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一些小问题
if name_mapping.target_name in paddle_state_dict: | ||
paddle_numpy = paddle_state_dict.pop(name_mapping.target_name) | ||
model_state_saver.add(name_mapping.target_name, "paddle", paddle_numpy) | ||
model_state_saver.add(name_mapping.target_name, "paddle-shape", str(paddle_numpy.shape)) | ||
|
||
pytorch_numpy = pytorch_state_dict.pop(name_mapping.source_name) | ||
model_state_saver.add(name_mapping.target_name, "pytorch", pytorch_numpy) | ||
model_state_saver.add(name_mapping.target_name, "pytorch-shape", str(pytorch_numpy.shape)) | ||
if name_mapping.source_name in pytorch_state_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个if 分开会不会有不对称的情况,即一个true 一个false. 可以考虑合成一个 if name_mapping.source_name in pytorch_state_dict and name_mapping.target_name in paddle_state_dict:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个确实是会存在:比如pytorch 那边把 attention 的 qkv 存在一次,此时只有一个weight,可是 paddle 这边还是会存成三个 tensor。
这段代码的作用将会变成:pytorch 会 hook attention(qkv 的上层layer)的 forward,而 paddle 会分别 hook attention 中 q、k、v 的 forward。
这也是在 gpt 中测试出来的,为了做hooker 兼容。虽然不能从 layer-name 层面一一对应,可是也是能够给出一些logit 信息,提供判断。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 LogitComparer 中对应 Attention的输出是:
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.q_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [0.03290959 0.0143059 0.11999971] | [ 0.50337297 -0.7915838 0.5165813 ] | [0.03290959 0.01430589 0.11999971] |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.k_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [-0.10358325 0.05546366 0.13997011] | | |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
| decoder.layers.0.self_attn.v_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [ 0.24479835 0.00965955 -0.06201734] | | |
+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+
d16d413
to
ac28583
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PR types
New features
PR changes
Models
Description
添加 roberta 和 gpt 的在线转化带阿米