[naga] Prove to downstream shader compilers that loops terminate

The Metal compiler and DXC are based on clang and inherit the "Infinite loop without side-effects is UB" from C++. SPIR-V also requires shader invocations to terminate.

The fact that "all loops must terminate" is a requirement of downstream shader compilers, but they might not at runtime gets us into trouble. They are allowed to make the assumption that loops do terminate which has far-reaching consequences. See comments in https://github.com/gfx-rs/wgpu/issues/6528 for the whole background.

WebGPU requires loops to terminate https://github.com/gpuweb/gpuweb/pull/3126 or the user agent might lose the device. The issue is that it's statically unprovable that a loop terminates (in all cases) so this can't be a check we do. We must emit loops that might not terminate but if we do, we trigger UB in downstream shader compilers.

To avoid triggering UB in downstream shader compilers we must prove to them that loops terminate or that they have side-effects.

The only way to we've found to avoid the UB via side-effects is to loop based on a volatile bool (originally implemented in tint). Open question: Are there other ways we could artificially introduce side-effects that prevent the UB?

This was done for Metal in https://github.com/gfx-rs/wgpu/pull/6545 but it prevents other meaningful optimizations like inlining. A previous iteration of this where the check was only happening before the loop was found to be very slow https://github.com/gfx-rs/wgpu/issues/6518#issuecomment-2467234435, the new check is probably going to be extremely slow since it's happening on every loop iteration.

I'm proposing that we inject a counter that puts an upper bound on the number of loop iterations so that downstream shader compilers will see that the loop does terminate (even if it will take a really long time). We can start with an upper bound of `u64::MAX` (using 2 `u32`s as outlined in https://github.com/gfx-rs/wgpu/issues/6528#issuecomment-2481883007) and see if we can get away with a single `u32` later. We can have this limit even if it's not part of the WGSL spec since drivers will end up terminating the invocations and lose the device after a certain amount of time has passed; which will certainly happen before we loop `u64::MAX` times.

Doing it this way should be much faster than reading a volatile every loop iteration and still allows other optimizations to see the loop might terminate a lot earlier so that it can even be inlined; see https://github.com/gfx-rs/wgpu/issues/6528#issuecomment-2477406932.

---

Checklist

- [x] MSL 
- [x] HLSL
- [ ] SPIR-V
- [ ] GLSL?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[naga] Prove to downstream shader compilers that loops terminate #6572

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development