disable kernel promotion for amp training #5922

zhangting2020 · 2023-05-15T10:57:31Z

PR types

Performance optimization

PR changes

Models

Description

背景：过去框架AMP在O2模式下，当OP支持低精度就会选择低精度的kernel，但这样的策略出现精度问题的风险较高。为保障训练精度，框架在2.5版本对AMP 策略进行了调整，即在O2模式下，仅当Op所有输入为低精度时才会选择低精度kernel，否则则采用FP32 Kernel（即promote的策略），因此可能会引起部分模型出现性能下降。

目前可以通过给auto_cast设置use_promote=False参数来回退到旧版本的O2策略，为了减少对模型配置的修改，本PR给模型库添加该参数的设置功能，当前模型库默认设置为use_promote=False，使用的是旧版本的O2策略，以解决性能下降问题。

PaddlePaddle框架动态图下默认的行为是use_promote=True，未来新增的模型如果出现精度问题，可以尝试给模型配置中增加设置去进行调试。

框架PR：PaddlePaddle/Paddle#53742

paddle-bot · 2023-05-15T10:57:35Z

Thanks for your contribution!

codecov · 2023-05-17T07:44:39Z

Codecov Report

Merging #5922 (2fe8bff) into develop (d066460) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #5922      +/-   ##
===========================================
- Coverage    63.52%   63.52%   -0.01%     
===========================================
  Files          514      514              
  Lines        73605    73607       +2     
===========================================
- Hits         46761    46760       -1     
- Misses       26844    26847       +3

see 3 files with indirect coverage changes

JunnYu

LGTM

zhangting2020 force-pushed the promote branch 2 times, most recently from 4c61cde to ec38311 Compare May 17, 2023 07:17

fightfat previously approved these changes May 29, 2023

View reviewed changes

zhangting2020 dismissed fightfat’s stale review via be37f2b June 6, 2023 07:48

zhangting2020 added 3 commits June 6, 2023 11:43

disable kernel promotion for amp training

a36164d

improve amp training performance

af0e893

fix eval

2fe8bff

zhangting2020 force-pushed the promote branch from be37f2b to 2fe8bff Compare June 6, 2023 11:47

JunnYu approved these changes Jun 6, 2023

View reviewed changes

JunnYu merged commit e831abd into PaddlePaddle:develop Jun 6, 2023

paddle-bot bot added the status: accepted label Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable kernel promotion for amp training #5922

disable kernel promotion for amp training #5922

zhangting2020 commented May 15, 2023

paddle-bot bot commented May 15, 2023

codecov bot commented May 17, 2023 •

edited

Loading

JunnYu left a comment

disable kernel promotion for amp training #5922

disable kernel promotion for amp training #5922

Conversation

zhangting2020 commented May 15, 2023

PR types

PR changes

Description

paddle-bot bot commented May 15, 2023

codecov bot commented May 17, 2023 • edited Loading

Codecov Report

JunnYu left a comment

Choose a reason for hiding this comment

codecov bot commented May 17, 2023 •

edited

Loading