Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Relax SDPA head size limitations for LLMs #24930

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

sshlyapn
Copy link
Contributor

@sshlyapn sshlyapn commented Jun 10, 2024

Details:

  • Relax SDPA head size limitations for LLMs from 128 only to a range of 64 to 256
  • Fix accuracy issue in SDPA first token processing for TARGET_SEQ_LEN_BLOCK_SIZE % SUBGROUPS_PER_WG != 0 case

…nge of 64 to 256; fix accuracy issue in SDPA kernel when processing the first token
@sshlyapn sshlyapn requested review from a team as code owners June 10, 2024 12:03
@sshlyapn sshlyapn added the category: GPU OpenVINO GPU plugin label Jun 10, 2024
@sshlyapn sshlyapn added this to the 2024.3 milestone Jun 10, 2024
@p-durandin p-durandin added this pull request to the merge queue Jun 11, 2024
Merged via the queue into openvinotoolkit:master with commit 00f4c99 Jun 11, 2024
102 checks passed
allnes pushed a commit to allnes/openvino that referenced this pull request Jun 27, 2024
### Details:
- Relax SDPA head size limitations for LLMs from 128 only to a range of
64 to 256
- Fix accuracy issue in SDPA first token processing for
`TARGET_SEQ_LEN_BLOCK_SIZE % SUBGROUPS_PER_WG != 0` case

### Tickets:
 - *ticket-id*
sshlyapn added a commit to sshlyapn/openvino that referenced this pull request Jun 27, 2024
### Details:
- Relax SDPA head size limitations for LLMs from 128 only to a range of
64 to 256
- Fix accuracy issue in SDPA first token processing for
`TARGET_SEQ_LEN_BLOCK_SIZE % SUBGROUPS_PER_WG != 0` case

### Tickets:
 - *ticket-id*
akladiev pushed a commit that referenced this pull request Jun 27, 2024
….2 (#25261)

### Details:
This PR is a backport of original
#24930 to OV 2024.2
version

- Relax SDPA head size limitations for LLMs from 128 only to a range of
64 to 256
- Fix accuracy issue in SDPA first token processing for
`TARGET_SEQ_LEN_BLOCK_SIZE % SUBGROUPS_PER_WG != 0` case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants