[release/8.0] Fix a possible infinite wait for GC completion at process shutdown. #107844
+60
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #107800
This is a partial/minimal port of #103877.
Cooperative process cleanup is fragile and #103877 addresses many potential issues, however the change is not small and in parts works on top of 9.0 changes.
This is a port of a small part of the change to address a specific scenario that is known to affect end users.
Customer Impact
Bug was reported by internal partners. In some relatively infrequent cases a worker process may get stuck at exiting.
Such "stuck" processes could become a nuisance, especially when the memory footprint of workers is very large.
Regression
Appears to be introduced in .NET 6 as the repro scenario passes with 5.0, but deadlocks in 6.0, 8.0 and early 9.0 previews
Testing
Added a targeted unit test.
Risk
Small.
The code already tries to detect if the process is shutting down. We just use a more reliable mechanism - a new Windows API introduced in Win10 (
RtlDllShutdownInProgress
)The main concern is that there could be other similar issues.
The 9.0 fix addresses several more patterns similar to the one involved here. They may or may not result in actual failures and there is some added risk that proactive fixing of other areas may actually break something, which we decided not to do in a servicing fix.