Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Evict program cache on PI_ERROR_OUT_OF_RESOURCES #11987

Merged
merged 8 commits into from
Dec 8, 2023

Conversation

jzc
Copy link
Contributor

@jzc jzc commented Nov 22, 2023

This PR adds resource cleanup mechanisms when PI_ERROR_OUT_OF_RESOURCES occurs on piProgramBuild and piProgramLink. Currently. the whole cache is cleared, but this behavior can be extended in the future. In order to ensure thread safety of the cache clearing mechanism, the maps of the cache now hold their BuildResult's in shared_ptr's, which allows the cache to be cleared even if there are still threads that still hold references to a BuildResult.

@jzc jzc requested a review from a team as a code owner November 22, 2023 23:53
@jzc jzc requested a review from maarquitos14 November 22, 2023 23:53
@jzc jzc temporarily deployed to WindowsCILock November 23, 2023 08:04 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock November 23, 2023 08:42 — with GitHub Actions Inactive
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, but we should probably address the race conditions before we consider merging this. Should it be converted to draft for good measure?

sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved
@jzc jzc marked this pull request as draft November 27, 2023 17:30
@jzc jzc force-pushed the program-cache-eviction branch from cb29584 to 9dce9fb Compare December 1, 2023 05:18
@jzc jzc temporarily deployed to WindowsCILock December 1, 2023 22:15 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock December 1, 2023 22:40 — with GitHub Actions Inactive
@jzc jzc marked this pull request as ready for review December 5, 2023 15:10
@jzc jzc temporarily deployed to WindowsCILock December 5, 2023 19:58 — with GitHub Actions Inactive
// CHECK-CACHE: piKernelRelease
// CHECK-CACHE: piKernelRelease
// CHECK-CACHE: piProgramRelease
// CHECK-CACHE: piProgramRelease
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is destructed in a different order now, which is why these are swapped. Also the previous version of this test only had 2 checks for release of program/kernel even though there were 3 programs/kernels that should have been created, which is why any extra check was added.

@jzc jzc temporarily deployed to WindowsCILock December 5, 2023 20:24 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock December 5, 2023 22:02 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock December 5, 2023 22:27 — with GitHub Actions Inactive
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to avoid the error, but just to avoid post-commit failure we should fix the unused variable.

@jzc jzc temporarily deployed to WindowsCILock December 7, 2023 16:31 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock December 7, 2023 16:57 — with GitHub Actions Inactive
@steffenlarsen steffenlarsen merged commit 6cf1ae0 into intel:sycl Dec 8, 2023
10 checks passed
KornevNikita added a commit to KornevNikita/llvm that referenced this pull request Jun 10, 2024
I'm observing cache overflow when running heavy tests on OCL backend
with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as
for PI_ERROR_OUT_OF_RESOURCES.
Using as reference: intel#11987
dm-vodopyanov pushed a commit that referenced this pull request Jun 13, 2024
I'm observing cache overflow when running heavy tests on OCL backend
with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as
for PI_ERROR_OUT_OF_RESOURCES.
Using as reference: #11987
ianayl pushed a commit to ianayl/sycl that referenced this pull request Jun 13, 2024
I'm observing cache overflow when running heavy tests on OCL backend
with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as
for PI_ERROR_OUT_OF_RESOURCES.
Using as reference: intel#11987
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants