Swapped naive dot product attention for flash attention #24

usryokousha · 2023-03-31T02:25:36Z

This pull request adds support for the Flash Attention mechanism to the MultiheadAttention module. Flash Attention is a recently proposed alternative to the conventional multi-head attention mechanism which reduces memory usage and improves training efficiency. The implementation in this pull request follows the paper "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness" (https://arxiv.org/abs/2205.14135)

Changes Made:

Replaced the conventional multi-head attention mechanism with the Flash
Attention mechanism in the forward() method.
Added support for the key_padding_mask argument to the F.scaled_dot_product_attention() function.
Updated the incremental_state logic to work with the new tensor shapes introduced by Flash Attention.
Applied the xpos method to both the q and k tensors in the forward() method.
Replaced masked_fill with the additive mask to combine the attention mask and key padding mask.
Please review and merge.

usryokousha · 2023-03-31T03:18:06Z

@microsoft-github-policy-service agree

mranzinger · 2023-04-27T09:24:14Z

I ran into some issues using this branch as-is, and created a pull request for it here: usryokousha#1

Please review and pull in, if applicable.

…ashAttention (e.g. no mask, fp/bf16)

Oh I must have overlooked that

usryokousha · 2023-06-20T10:34:32Z

Please merge with master

usryokousha · 2023-06-20T10:35:00Z

Please merge with master

incorporated fast attention into attention

37b64d4

mranzinger added 4 commits April 23, 2023 18:08

Update multihead_attention.py

a5a9419

Update multihead_attention.py

412a1a3

Update multihead_attention.py

d4a62cc

Update multihead_attention.py

62cedb9

mranzinger and others added 2 commits May 9, 2023 19:21

Masks are now optional, and not created. Fixes required to support Fl…

29c6ead

…ashAttention (e.g. no mask, fp/bf16)

Merge pull request #1 from mranzinger/efficient

dd69dcb

Oh I must have overlooked that

usryokousha closed this Jun 20, 2023

usryokousha reopened this Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swapped naive dot product attention for flash attention #24

Swapped naive dot product attention for flash attention #24

usryokousha commented Mar 31, 2023

usryokousha commented Mar 31, 2023

mranzinger commented Apr 27, 2023

usryokousha commented Jun 20, 2023

usryokousha commented Jun 20, 2023

Swapped naive dot product attention for flash attention #24

Are you sure you want to change the base?

Swapped naive dot product attention for flash attention #24

Conversation

usryokousha commented Mar 31, 2023

usryokousha commented Mar 31, 2023

mranzinger commented Apr 27, 2023

usryokousha commented Jun 20, 2023

usryokousha commented Jun 20, 2023