Wgpu/Clamp Kernels #866

agelas · 2023-10-15T06:35:45Z

Pull Request Template

Checklist

Confirm that run-checks script has been executed.

Related Issues/PRs

#549

Changes

Added clamp kernels and shaders to wgpu backend.
@antimora I used the mask_fill method that used to be the former default for the candle backend. I took a peak at candle-core and it looks like they added their own clamp method about a month ago, so we can update once burn-candle is pinned to a more recent version.

Testing

Linked backends to their respective clamp_min/clamp_max implementations instead of mask_fill.

nathanielsimard

The kernels looks good, but I would probably have the < and > operations be a template key. Though you would need to create your own KernelTemplate wrapper to fill the template, so it's more Rust code vs more wgsl code, it's a very optional refactor.

nathanielsimard · 2023-10-15T20:53:56Z

burn-candle/src/ops/tensor.rs

+    fn clamp_min<const D: usize>(
+        tensor: FloatTensor<Self, D>,
+        min: FloatElem<Self>,
+    ) -> FloatTensor<Self, D> {
+        let mask = Self::lower_elem(tensor.clone(), min);
+        Self::mask_fill(tensor, mask, min)
+    }
+
+    fn clamp_max<const D: usize>(
+        tensor: FloatTensor<Self, D>,
+        max: FloatElem<Self>,
+    ) -> FloatTensor<Self, D> {
+        let mask = Self::greater_elem(tensor.clone(), max);
+        Self::mask_fill(tensor, mask, max)
+    }
+
+    fn clamp<const D: usize>(
+        tensor: FloatTensor<Self, D>,
+        min: FloatElem<Self>,
+        max: FloatElem<Self>,
+    ) -> FloatTensor<Self, D> {
+        Self::clamp_min(Self::clamp_max(tensor, max), min)
+    }


Those are the default implementations, we should implement those methods only if we actually add custom implementations.

Should I take those out of here and put them back in burn-tensor/src/tensor/ops/tensor.rs?

burn-tensor/src/tensor/ops/tensor.rs

louisfd

The main problem i see is that your kernel is always in-place. Normally we like to have a kernel for when tensors can mut (in-place), and another for when they cannot (because there are too many references to them), which will create an output buffer. This logic is already implemented, look for unary ops. Using the unary template, you will only have to provide the logic for each value. For an example you can look at relu in burn-wgpu/src/ops/activation_ops.rs

Also, creating a third kernel for clamp (with both lower and upper bounds) would be a very low hanging fruit once you have made the other two, and it would be a great optimization as we would only launch one kernel instead of two in this case.

agelas · 2023-10-20T05:51:37Z

@louisfd I see what you're saying. I'm not super familiar with proc macros, so correct me if I'm wrong, but don't they expect static literal inputs? In other words, the unary! or unary_inplace! can't evaluate the clamp value at compile time when the macros are expanded. So as an alternative I can just write one kernel for in-place and one kernel for not. Looking at some WGSL references, it turns out clamp(value, min, max) is a built-in function so I can handle all three variations.

louisfd · 2023-10-20T13:11:41Z

@agelas
You're right, unary is not the right one, I got fooled by the relu because it's kind of a clamp, but its clamp value is hardcoded.
It would rather be unary_scalar, where you can give a second, scalar argument.

agelas · 2023-10-21T09:29:49Z

@louisfd I think this is closer to what you're looking for now. There's no proc macros that really fit the pattern I'd need to make use of wgsl's built-in clamp, so I wrote kernels for in-place and not in-place.

louisfd

It looks very good now!
For candle, if we only implement clamp and not clamp_min and clamp_max, then those two will fall back to their default implementation that uses a combination of kernels, so they will be slower than clamp. Looking at candle's code, it seems clamp is just a combination of minimum and maximum which are public, so it should be trivial to have clamp_min call maximum and clamp_max call minimum

agelas · 2023-10-23T05:33:30Z

@louisfd Done!

louisfd

Thanks a lot @agelas !
Merging it

agelas added 6 commits October 14, 2023 22:33

Update kernel mod.rs

48d2346

Wgpu crate implementations and add shader files

0526b27

Direct backends to the correct implementation

48e624e

Use mask method for candle

0b2b9fc

Add index out of bounds protection

e2db2b2

Use a macro to avoid duplication

9cde203

nathanielsimard requested changes Oct 15, 2023

View reviewed changes

louisfd requested changes Oct 17, 2023

View reviewed changes

agelas added 9 commits October 20, 2023 22:04

Use unary_scalar templates

6bae0c4

New shaders for clamp and clamp_inplace

925fe4d

Remove unneccessary clamp shaders

5bec0ee

Clamp implementation and test

29962c5

Use new clamp implementation for float and int ops

05f2441

Better variable names for clamp_min/max

318249a

Revert changes to tensor/ops/tensor.rs

1139ebd

Fix clamp.wgsl

ee21e9d

Fix shader types

b99b4aa

agelas added 2 commits October 22, 2023 00:05

Merge branch 'main' into wgpu/clamp-kernels

cf1443f

Use native candle clamp

eeb4277

louisfd requested changes Oct 22, 2023

View reviewed changes

agelas added 2 commits October 22, 2023 16:48

Use candle ops for clamp_min/max and revert tensor.rs

2d8e7bb

Maximum/minimum were reversed

a03c11c

louisfd approved these changes Oct 23, 2023

View reviewed changes

louisfd merged commit 07c0cf1 into tracel-ai:main Oct 23, 2023

louisfd mentioned this pull request Oct 23, 2023

Implement clamp kernel for WGPU #549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wgpu/Clamp Kernels #866

Wgpu/Clamp Kernels #866

agelas commented Oct 15, 2023

nathanielsimard left a comment

nathanielsimard Oct 15, 2023

agelas Oct 17, 2023

nathanielsimard Oct 17, 2023

louisfd left a comment

agelas commented Oct 20, 2023

louisfd commented Oct 20, 2023

agelas commented Oct 21, 2023

louisfd left a comment

agelas commented Oct 23, 2023

louisfd left a comment

Wgpu/Clamp Kernels #866

Wgpu/Clamp Kernels #866

Conversation

agelas commented Oct 15, 2023

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

nathanielsimard left a comment

Choose a reason for hiding this comment

nathanielsimard Oct 15, 2023

Choose a reason for hiding this comment

agelas Oct 17, 2023

Choose a reason for hiding this comment

nathanielsimard Oct 17, 2023

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

agelas commented Oct 20, 2023

louisfd commented Oct 20, 2023

agelas commented Oct 21, 2023

louisfd left a comment

Choose a reason for hiding this comment

agelas commented Oct 23, 2023

louisfd left a comment

Choose a reason for hiding this comment