[New features] add roberta & gpt conversion #4407

wj-Mcat · 2023-01-10T11:53:19Z

PR types

New features

PR changes

Models

Description

添加 roberta 和 gpt 的在线转化带阿米

paddle-bot · 2023-01-10T11:53:23Z

Thanks for your contribution!

codecov · 2023-01-10T12:05:47Z

Codecov Report

Merging #4407 (8f586c2) into develop (ed1f5ac) will increase coverage by 0.47%.
The diff coverage is 77.27%.

@@             Coverage Diff             @@
##           develop    #4407      +/-   ##
===========================================
+ Coverage    39.62%   40.10%   +0.47%     
===========================================
  Files          433      439       +6     
  Lines        60982    61568     +586     
===========================================
+ Hits         24165    24689     +524     
- Misses       36817    36879      +62

Impacted Files	Coverage Δ
paddlenlp/transformers/conversion_utils.py	`29.79% <54.54%> (+6.32%)`	⬆️
paddlenlp/transformers/gpt/configuration.py	`100.00% <100.00%> (ø)`
paddlenlp/transformers/gpt/modeling.py	`78.21% <100.00%> (+0.79%)`	⬆️
paddlenlp/transformers/roberta/modeling.py	`90.18% <100.00%> (+0.32%)`	⬆️
paddlenlp/utils/serialization.py	`88.28% <100.00%> (+67.18%)`	⬆️
paddlenlp/transformers/chineseclip/modeling.py	`85.15% <0.00%> (-0.36%)`	⬇️
paddlenlp/transformers/__init__.py	`100.00% <0.00%> (ø)`
paddlenlp/transformers/auto/modeling.py	`71.88% <0.00%> (ø)`
paddlenlp/transformers/auto/tokenizer.py	`81.74% <0.00%> (ø)`
paddlenlp/transformers/cmsim_lock/tokenizer.py	`100.00% <0.00%> (ø)`
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

sijunhe

一些小问题

tests/transformers/gpt/test_modeling.py

paddlenlp/transformers/roberta/modeling.py

sijunhe · 2023-01-11T05:42:52Z

paddlenlp/transformers/conversion_utils.py

+            if name_mapping.target_name in paddle_state_dict:
+                paddle_numpy = paddle_state_dict.pop(name_mapping.target_name)
+                model_state_saver.add(name_mapping.target_name, "paddle", paddle_numpy)
+                model_state_saver.add(name_mapping.target_name, "paddle-shape", str(paddle_numpy.shape))

-            pytorch_numpy = pytorch_state_dict.pop(name_mapping.source_name)
-            model_state_saver.add(name_mapping.target_name, "pytorch", pytorch_numpy)
-            model_state_saver.add(name_mapping.target_name, "pytorch-shape", str(pytorch_numpy.shape))
+            if name_mapping.source_name in pytorch_state_dict:


这两个if 分开会不会有不对称的情况，即一个true 一个false. 可以考虑合成一个 if name_mapping.source_name in pytorch_state_dict and name_mapping.target_name in paddle_state_dict:

这个确实是会存在：比如pytorch 那边把 attention 的 qkv 存在一次，此时只有一个weight，可是 paddle 这边还是会存成三个 tensor。

这段代码的作用将会变成：pytorch 会 hook attention（qkv 的上层layer）的 forward，而 paddle 会分别 hook attention 中 q、k、v 的 forward。

这也是在 gpt 中测试出来的，为了做hooker 兼容。虽然不能从 layer-name 层面一一对应，可是也是能够给出一些logit 信息，提供判断。

在 LogitComparer 中对应 Attention的输出是:

+--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+ | decoder.layers.0.self_attn.q_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [0.03290959 0.0143059 0.11999971] | [ 0.50337297 -0.7915838 0.5165813 ] | [0.03290959 0.01430589 0.11999971] | +--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+ | decoder.layers.0.self_attn.k_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [-0.10358325 0.05546366 0.13997011] | | | +--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+ | decoder.layers.0.self_attn.v_proj.weight | [ 0.50337297 -0.7915838 0.5165813 ] | [ 0.24479835 0.00965955 -0.06201734] | | | +--------------------------------------------+---------------------------------------+------------------------------------------------+---------------------------------------+------------------------------------------------+

…nto add-conversion-0

sijunhe

lgtm

add roberta & gpt conversion

c1c3b54

update gpt model

ac28583

wj-Mcat marked this pull request as ready for review January 11, 2023 05:11

sijunhe reviewed Jan 11, 2023

View reviewed changes

wj-Mcat force-pushed the add-conversion-0 branch from d16d413 to ac28583 Compare January 11, 2023 07:24

wj-Mcat added 5 commits January 11, 2023 15:26

revert roberta related files

84eac8b

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

d64cda9

…nto add-conversion-0

update gpt loading

24ffa7d

update requirements

786072e

fix input_ids

8f586c2

sijunhe approved these changes Jan 11, 2023

View reviewed changes

sijunhe merged commit b2a24c6 into PaddlePaddle:develop Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New features] add roberta & gpt conversion #4407

[New features] add roberta & gpt conversion #4407

wj-Mcat commented Jan 10, 2023

paddle-bot bot commented Jan 10, 2023

codecov bot commented Jan 10, 2023 •

edited

Loading

sijunhe left a comment

sijunhe Jan 11, 2023

wj-Mcat Jan 11, 2023

wj-Mcat Jan 11, 2023 •

edited

Loading

sijunhe left a comment

[New features] add roberta & gpt conversion #4407

[New features] add roberta & gpt conversion #4407

Conversation

wj-Mcat commented Jan 10, 2023

PR types

PR changes

Description

paddle-bot bot commented Jan 10, 2023

codecov bot commented Jan 10, 2023 • edited Loading

Codecov Report

sijunhe left a comment

Choose a reason for hiding this comment

sijunhe Jan 11, 2023

Choose a reason for hiding this comment

wj-Mcat Jan 11, 2023

Choose a reason for hiding this comment

wj-Mcat Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

sijunhe left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 10, 2023 •

edited

Loading

wj-Mcat Jan 11, 2023 •

edited

Loading