[Unification] Generalize TransformerEncoder #240

RdoubleA · 2022-08-03T16:13:40Z

Stack from ghstack (oldest at bottom):

Differential Revision: D38506881

Summary

Add a general TransformerEncoder class that simply stacks n layers of our custom TransformerEncoderLayer. Repurpose FLAVATransformerOutput and use it as TransformerOutput for this class.

Test plan

Newly added unit tests, pytest test/modules/layers/test_transformer.py -vv

===================================================== test session starts ======================================================
platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0 -- /fsx/users/rafiayub/conda/envs/torchmm/bin/python
cachedir: .pytest_cache
rootdir: /data/home/rafiayub/torchmultimodal, configfile: pyproject.toml
plugins: hydra-core-1.1.2, cov-3.0.0, mock-3.8.2
collected 5 items                                                                                                              

test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_prenorm PASSED                        [ 20%]
test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_postnorm PASSED                       [ 40%]
test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_prenorm PASSED                 [ 60%]
test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_postnorm PASSED                [ 80%]
test/modules/layers/test_transformer.py::TestTransformerEncoder::test_forward PASSED                                     [100%]

====================================================== 5 passed in 1.59s =======================================================

[ghstack-poisoned]

ghstack-source-id: 6971e2c9faff1679bb6e19c93bc12492d7f8f59d Pull Request resolved: #240

[ghstack-poisoned]

ghstack-source-id: 5372493620f0413999840da896ab072b38e571ed Pull Request resolved: #240

[ghstack-poisoned]

codecov-commenter · 2022-08-05T17:28:28Z

Codecov Report

❗ No coverage uploaded for pull request base (gh/RdoubleA/29/base@d8d9305). Click here to learn what that means.
The diff coverage is n/a.

@@                  Coverage Diff                   @@
##             gh/RdoubleA/29/base     #240   +/-   ##
======================================================
  Coverage                       ?   92.16%           
======================================================
  Files                          ?       50           
  Lines                          ?     3077           
  Branches                       ?        0           
======================================================
  Hits                           ?     2836           
  Misses                         ?      241           
  Partials                       ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

langong347 · 2022-08-05T23:33:47Z

torchmultimodal/modules/layers/transformer.py

+from torchmultimodal.utils.common import get_clones
+
+
+TransformerOutput = namedtuple(


Should we just uniformly subclass from NamedTuple for model outputs?

langong347 · 2022-08-05T23:35:14Z

torchmultimodal/modules/layers/transformer.py

+            if return_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+
+            layer_head_mask = head_mask[i] if head_mask is not None else None


head_masks can be different across the layers?

langong347 · 2022-08-05T23:39:37Z

torchmultimodal/modules/layers/transformer.py

+                return_attn_weights=True,
+            )
+
+            hidden_states = layer_outputs[0]


If layer_outputs only contains the hidden_states, this will try to index the 0-th element instead of returning the full hidden_states, right?

if return_attn_weights: return outputs, attn_weights else: return outputs

Consider adding a unit test when the kwargs args are False as well?

output = encoder(inputs, return_hidden_states=True, return_attn_weights=True)

Not sure if this has been addressed as the encoder layer doesn't always return two tensors, right?

multimodal/torchmultimodal/modules/layers/transformer.py

Line 381 in adc36c1

layer_outputs, attn_weights = layer_module(

return_attn_weights is set to True so it will guarantee returning two outputs

Then we should probably ask TransformerEncoderLayer to always return two outputs anyways.

hmm, maybe I'll modify to handle both single and double outputs

langong347 · 2022-08-05T23:41:46Z

torchmultimodal/modules/layers/normalizations.py

@@ -22,3 +23,13 @@ def forward(self, x: Tensor) -> Tensor:
            self.eps,
        )
        return output.type_as(x)
+
+
+def fp32layernorm(x: Tensor, layernorm: nn.Module) -> Tensor:


Why need the functional besides class Fp32LayerNorm?

[ghstack-poisoned]

RdoubleA · 2022-08-08T17:01:48Z

@RdoubleA has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D38506881](https://our.internmc.facebook.com/intern/diff/D38506881) ## Summary Add a general `TransformerEncoder` class that simply stacks `n` layers of our custom `TransformerEncoderLayer`. Repurpose `FLAVATransformerOutput` and use it as `TransformerOutput` for this class. ## Test plan Newly added unit tests, `pytest test/modules/layers/test_transformer.py -vv` ``` ===================================================== test session starts ====================================================== platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0 -- /fsx/users/rafiayub/conda/envs/torchmm/bin/python cachedir: .pytest_cache rootdir: /data/home/rafiayub/torchmultimodal, configfile: pyproject.toml plugins: hydra-core-1.1.2, cov-3.0.0, mock-3.8.2 collected 5 items test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_prenorm PASSED [ 20%] test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_postnorm PASSED [ 40%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_prenorm PASSED [ 60%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_postnorm PASSED [ 80%] test/modules/layers/test_transformer.py::TestTransformerEncoder::test_forward PASSED [100%] ====================================================== 5 passed in 1.59s ======================================================= ``` [ghstack-poisoned]

RdoubleA · 2022-08-09T20:05:39Z

@RdoubleA has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ankitade · 2022-08-09T22:55:27Z

torchmultimodal/modules/layers/transformer.py

+                for _ in range(n_layer)
+            ]
+        )
+        self.num_layers = n_layer


hmmm I think it was used in CLIPTextEncoder. I won't be able to get to unifying that component but maybe we should keep self.num_layers for when we do?

i think the caller can maintain the num layers. no need to add unused instance variables here

aight I'll remove it

ankitade · 2022-08-09T22:56:59Z

torchmultimodal/modules/layers/transformer.py

+        all_hidden_states: Tuple[Tensor, ...] = () if return_hidden_states else None
+        all_self_attentions: Tuple[Tensor, ...] = () if return_attn_weights else None
+
+        for i, layer_module in enumerate(self.layer):


you dont need enumeration

ankitade · 2022-08-09T22:58:13Z

torchmultimodal/modules/layers/transformer.py

+
+        for i, layer_module in enumerate(self.layer):
+            if return_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)


hm will this add the original input hidden states too?

this is straight from the original FLAVATransformerEncoder:

multimodal/torchmultimodal/modules/layers/transformer.py

Line 384 in f9327de

all_hidden_states.append(hidden_states)

Differential Revision: [D38506881](https://our.internmc.facebook.com/intern/diff/D38506881) ## Summary Add a general `TransformerEncoder` class that simply stacks `n` layers of our custom `TransformerEncoderLayer`. Repurpose `FLAVATransformerOutput` and use it as `TransformerOutput` for this class. ## Test plan Newly added unit tests, `pytest test/modules/layers/test_transformer.py -vv` ``` ===================================================== test session starts ====================================================== platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0 -- /fsx/users/rafiayub/conda/envs/torchmm/bin/python cachedir: .pytest_cache rootdir: /data/home/rafiayub/torchmultimodal, configfile: pyproject.toml plugins: hydra-core-1.1.2, cov-3.0.0, mock-3.8.2 collected 5 items test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_prenorm PASSED [ 20%] test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_postnorm PASSED [ 40%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_prenorm PASSED [ 60%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_postnorm PASSED [ 80%] test/modules/layers/test_transformer.py::TestTransformerEncoder::test_forward PASSED [100%] ====================================================== 5 passed in 1.59s ======================================================= ``` [ghstack-poisoned]

RdoubleA · 2022-08-10T23:17:01Z

@RdoubleA has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D38506881](https://our.internmc.facebook.com/intern/diff/D38506881) ## Summary Add a general `TransformerEncoder` class that simply stacks `n` layers of our custom `TransformerEncoderLayer`. Repurpose `FLAVATransformerOutput` and use it as `TransformerOutput` for this class. ## Test plan Newly added unit tests, `pytest test/modules/layers/test_transformer.py -vv` ``` ===================================================== test session starts ====================================================== platform linux -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0 -- /fsx/users/rafiayub/conda/envs/torchmm/bin/python cachedir: .pytest_cache rootdir: /data/home/rafiayub/torchmultimodal, configfile: pyproject.toml plugins: hydra-core-1.1.2, cov-3.0.0, mock-3.8.2 collected 5 items test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_prenorm PASSED [ 20%] test/modules/layers/test_transformer.py::TestTransformerEncoderLayer::test_forward_postnorm PASSED [ 40%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_prenorm PASSED [ 60%] test/modules/layers/test_transformer.py::TestTransformerCrossAttentionLayer::test_forward_postnorm PASSED [ 80%] test/modules/layers/test_transformer.py::TestTransformerEncoder::test_forward PASSED [100%] ====================================================== 5 passed in 1.59s ======================================================= ``` [ghstack-poisoned]

RdoubleA · 2022-08-11T19:24:58Z

@RdoubleA has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

[Unification] Generalize TransformerEncoder

0d4ab5b

[ghstack-poisoned]

This was referenced Aug 3, 2022

[Unification] Create general TransformerEncoderLayer #218

Closed

[Unification][FLAVA] Replace FLAVA's encoder layer #221

Closed

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 3, 2022

This was referenced Aug 3, 2022

[Unification][ALBEF] Replace ALBEF text encoder layer #222

Closed

[Unification][ALBEF] Replace ALBEF multimodal encoder layer #223

Closed

[Unification] Move all encoders/transformers to models folders #239

Closed

RdoubleA marked this pull request as draft August 3, 2022 16:14

RdoubleA added a commit that referenced this pull request Aug 3, 2022

[Unification] Generalize TransformerEncoder

2ce1f29

ghstack-source-id: 6971e2c9faff1679bb6e19c93bc12492d7f8f59d Pull Request resolved: #240

Update on "[Unification] Generalize TransformerEncoder"

d92cb91

[ghstack-poisoned]

RdoubleA added a commit that referenced this pull request Aug 3, 2022

[Unification] Generalize TransformerEncoder

d46c236

ghstack-source-id: 5372493620f0413999840da896ab072b38e571ed Pull Request resolved: #240

RdoubleA marked this pull request as ready for review August 3, 2022 17:31

Update on "[Unification] Generalize TransformerEncoder"

ecf6841

[ghstack-poisoned]

This was referenced Aug 4, 2022

[Unification][FLAVA] Refactor FLAVA with TransformerEncoder #243

Closed

[Unification][ALBEF] Refactor ALBEF with TransformerEncoder #244

Closed

[Unification] Unified text embeddings #245

Closed

Update on "[Unification] Generalize TransformerEncoder"

7acdba4

[ghstack-poisoned]

This was referenced Aug 5, 2022

[Unification] Unified text embeddings #248

Closed

[Unification] Unified text encoder #249

Closed

RdoubleA requested review from ebsmothers, katrina433, sophiazhi, ankitade and langong347 August 5, 2022 17:19

langong347 reviewed Aug 5, 2022

View reviewed changes

Update on "[Unification] Generalize TransformerEncoder"

adc36c1

[ghstack-poisoned]

RdoubleA mentioned this pull request Aug 9, 2022

[Unification][FLAVA] Refactor FLAVA with unified text encoder #253

Closed

langong347 approved these changes Aug 9, 2022

View reviewed changes

ankitade reviewed Aug 9, 2022

View reviewed changes

RdoubleA added 2 commits August 9, 2022 23:27

ankitade approved these changes Aug 11, 2022

View reviewed changes

facebook-github-bot closed this in 0d7591b Aug 11, 2022

facebook-github-bot deleted the gh/RdoubleA/29/head branch August 15, 2022 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unification] Generalize TransformerEncoder #240

[Unification] Generalize TransformerEncoder #240

RdoubleA commented Aug 3, 2022 •

edited

Loading

codecov-commenter commented Aug 5, 2022 •

edited

Loading

langong347 Aug 5, 2022

langong347 Aug 5, 2022

langong347 Aug 5, 2022

langong347 Aug 8, 2022

RdoubleA Aug 8, 2022

langong347 Aug 9, 2022 •

edited

Loading

RdoubleA Aug 9, 2022

langong347 Aug 5, 2022

RdoubleA commented Aug 8, 2022

RdoubleA commented Aug 9, 2022

ankitade Aug 9, 2022

RdoubleA Aug 9, 2022

ankitade Aug 10, 2022

RdoubleA Aug 10, 2022

ankitade Aug 9, 2022

ankitade Aug 9, 2022

RdoubleA Aug 9, 2022

RdoubleA commented Aug 10, 2022

RdoubleA commented Aug 11, 2022

		from torchmultimodal.utils.common import get_clones


		TransformerOutput = namedtuple(

[Unification] Generalize TransformerEncoder #240

[Unification] Generalize TransformerEncoder #240

Conversation

RdoubleA commented Aug 3, 2022 • edited Loading

Summary

Test plan

codecov-commenter commented Aug 5, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

langong347 Aug 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RdoubleA commented Aug 8, 2022

RdoubleA commented Aug 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RdoubleA commented Aug 10, 2022

RdoubleA commented Aug 11, 2022

RdoubleA commented Aug 3, 2022 •

edited

Loading

codecov-commenter commented Aug 5, 2022 •

edited

Loading

langong347 Aug 9, 2022 •

edited

Loading