Update vit.py #274

LuYuchenOrRobert · 2023-08-05T02:11:46Z

In the official code of Vision Transformer, the final LayerNorm is conducted after self.transformer(x) but before taking the mean or class token from x. It is revised here to be consistent with the official version as chaging the position of LayerNorm does affect the calculation results (not sure about whether affects the model performance).

In the official code of Vision Transformer, the final LayerNorm is conducted after `self.transformer(x)` but before taking the mean or class token from `x`. Therefore, it is revised here to be consistent with the official version.

lucidrains · 2023-08-05T16:58:30Z

@LuYuchenOrRobert oh yes, this would affect mean pooling (but not cls token)

i should change all the ViTs (whichever ones are using global mean pooling) to have this 'correct' order

lucidrains · 2023-08-09T14:55:01Z

@LuYuchenOrRobert you want to see if 3e5d1be looks right to you? i'll finish converting the rest of the vision transformers using global mean pooling later this week

LuYuchenOrRobert · 2023-10-30T22:54:14Z

@lucidrains Sorry for my late response. It looks right now. Thank you for your contribution!

vivekh2000 · 2024-04-21T13:32:40Z

Hi, @lucidrains. If you look at the forward function of the Transformer class in vit.py, it is doing the layer norm after the Transformer block.

This means that after every transformer block layer, the norm operation is performed. But it is required only after the last Transformer block, i.e., after this line,

vit-pytorch/vit_pytorch/vit.py

Line 122 in 96f66d2

x = self.transformer(x)

Because Attention and Feedforward class already have their LayerNorm.

I also think the call to to_latent(x) is doing nothing as it is nn.Identity only. This can also be removed. Thank you for the simple implementation. :)

Update vit.py

89756a5

In the official code of Vision Transformer, the final LayerNorm is conducted after `self.transformer(x)` but before taking the mean or class token from `x`. Therefore, it is revised here to be consistent with the official version.

lucidrains added a commit that referenced this pull request Aug 9, 2023

address #274

3e5d1be

lucidrains force-pushed the main branch from 014df1e to df8733d Compare October 6, 2023 17:27

LuYuchenOrRobert closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update vit.py #274

Update vit.py #274

LuYuchenOrRobert commented Aug 5, 2023

lucidrains commented Aug 5, 2023 •

edited

Loading

lucidrains commented Aug 9, 2023

LuYuchenOrRobert commented Oct 30, 2023

vivekh2000 commented Apr 21, 2024 •

edited

Loading

Update vit.py #274

Update vit.py #274

Conversation

LuYuchenOrRobert commented Aug 5, 2023

lucidrains commented Aug 5, 2023 • edited Loading

lucidrains commented Aug 9, 2023

LuYuchenOrRobert commented Oct 30, 2023

vivekh2000 commented Apr 21, 2024 • edited Loading

lucidrains commented Aug 5, 2023 •

edited

Loading

vivekh2000 commented Apr 21, 2024 •

edited

Loading