Remove default independent sampler jitter but ensure positive variance #888

uri-granta · 2025-01-03T10:24:23Z

Related issue(s)/PRs:

Summary

As discussed elsewhere, jitter isn't necessary for independent reparametrization sampling, beyond wanting to ensure that the variance is non-zero.

Fully backwards compatible: yes

PR checklist

The quality checks are all passing
The bug case / new feature is covered by tests
Any new features are well-documented (in docstrings or notebooks)

uri-granta · 2025-01-06T10:04:40Z

trieste/models/gpflow/sampler.py

@@ -133,7 +134,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
        tf.debugging.assert_greater_equal(jitter, 0.0)

        mean, var = self._model.predict(at[..., None, :, :])  # [..., 1, 1, L], [..., 1, 1, L]
-        var = var + jitter
+        var = ensure_positive(var + jitter)


(note that we could alternatively ignore the jitter argument here, even if it's explicitly provided, if we think that would be better)

This version might be a bit difficult to read and debug, as we are potentially applying a correction twice (we apply the jitter with the sum, then with ensure_positive we potentially add an offset).

But I'm not sure if there exists a better alternative

One solution to both this comment and the one at the end would be to change the default value to -1, and comment that this magic value doesn't add jitter but ensures that the variance is positive. And then if the user specifies an explicit non-negative jitter we can use that unmodified?

(Engineering-wise it would be nicer to make jitter an Optional[float] but that would necessitate changing the interface and modifying the other samplers too.)

I would explicitly ignore the jitter here and add to docstrings that it is ignored - perhaps lets also do it properly and change it to be optional

here I think there should be no reason for the user to want a different jitter, right @vpicheny ?

I think the main case for the jitter here is when the sampling is used with an acquisition function, possibly using sqrt(var) or log(var) or cdf(mean, var), that would fail if it is numerically zero but negative.

Otherwise we would probably just want to avoid any offset that would get in the way, e.g. say the output is not rescaled and has very very small values so adding 1e-6 would change everything.

We could leave this logic to the acquisition function, or just ensure here that we are "just positive".

@vpicheny so is your suggestion to ignore the jitter in IndependentReparametrizationSampler but still call ensure_positive?

@hstojic similarly, are you proposing to call ensure_positive in deep_ensemble_trajectory rather than adding DEFAULTS.JITTER?

@hstojic similarly, are you proposing to call ensure_positive in deep_ensemble_trajectory rather than adding DEFAULTS.JITTER?

lets raise a separate PR for keras and gpflux

I think @vpicheny is not sure, either

no change and leave it to whatever is using sample, or

ignore it and make it barely positive when we find 0

I would go with 2, but perhaps then make sure it can be overriden by jitter argument?

Not sure what you mean by "ignore it but make sure it can be overridden by jitter argument". What does overriding mean? (If we allow users to specify a jitter value then isn't that option 1?)

vpicheny · 2025-01-06T10:15:57Z

tests/unit/models/gpflow/test_sampler.py

@@ -285,6 +285,20 @@ def test_independent_reparametrization_sampler_reset_sampler(qmc: bool, qmc_skip
        npt.assert_array_less(1e-9, tf.abs(samples2 - samples1))


+@pytest.mark.parametrize("qmc", [True, False])
+@pytest.mark.parametrize("dtype", [tf.float32, tf.float64])
+def test_independent_reparametrization_sampler_sample_ensures_positive_variance(


I am not sure what this test is doing... does setting the kernel amplitude to 0 makes the model variance equal to zero? should we check then that the model prediction variance is zero, but the sampler applies the right fix?

Yes, that's right. I've now added an assert that the model variance is zero.

vpicheny · 2025-01-06T10:17:55Z

trieste/utils/misc.py

+def ensure_positive(x: TensorType) -> TensorType:
+    """Esure that all the elements in `x` are strictly positive (using a dtype-dependent
+    capping threshold."""
+    return tf.math.maximum(x, 1e-6 if x.dtype == tf.float32 else 1e-16)


naive question, is 1e-6 the lowest we can have with single precision?

Not at all. This was just based on scaling up the suggested value of 1e-16 for float64. Both numbers can go significantly smaller if we want: float32 can go down to aound 1e-38 and float64 to 2e-308. Do you have any intuition for how small we should make these?

I think we may be fine with smallest number for each precision that makes it positive, though it may depend on the usage downstream - at the moment we are just taking sqrt and doing some multiplication, that will take it to equal 0 but in this use case it should be fine I think? eps contribution would be removed in these cases, but not sure if that's relevant

I agree, I would probably vote for a very small value on both cases. 1e-6 is way too high.

And maybe we do not need to differentiate between single and double precision? Both could be e.g. 1e-32 or something

we do want to avoid "jumps" so I would go with something close to the end of range - imagine having an adjacent point that is 1e-300 but then you swap 0 with 1e-32, that would create a jump, no?

ensure_positive will turn both 0 and 1e-300 to 1e-32, so there won't be a jump (but there won't be any gradient either)

vpicheny

Looks good but something that bothers me is that there is no way of bypassing the corrections given by ensure_positive.
If someone really wants to use the "true" variance, which could be exactly zero, or just control manually the amount of correction, there is no way of doing this.
But maybe it's OK as it is?

hstojic

see comments

hstojic · 2025-01-06T13:41:03Z

trieste/models/gpflow/sampler.py

@@ -133,7 +134,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
        tf.debugging.assert_greater_equal(jitter, 0.0)

        mean, var = self._model.predict(at[..., None, :, :])  # [..., 1, 1, L], [..., 1, 1, L]
-        var = var + jitter
+        var = ensure_positive(var + jitter)


I would explicitly ignore the jitter here and add to docstrings that it is ignored - perhaps lets also do it properly and change it to be optional

hstojic · 2025-01-06T13:41:53Z

trieste/models/gpflow/sampler.py

@@ -133,7 +134,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
        tf.debugging.assert_greater_equal(jitter, 0.0)

        mean, var = self._model.predict(at[..., None, :, :])  # [..., 1, 1, L], [..., 1, 1, L]
-        var = var + jitter
+        var = ensure_positive(var + jitter)


here I think there should be no reason for the user to want a different jitter, right @vpicheny ?

hstojic · 2025-01-06T14:16:53Z

trieste/utils/misc.py

+def ensure_positive(x: TensorType) -> TensorType:
+    """Esure that all the elements in `x` are strictly positive (using a dtype-dependent
+    capping threshold."""
+    return tf.math.maximum(x, 1e-6 if x.dtype == tf.float32 else 1e-16)


I think we may be fine with smallest number for each precision that makes it positive, though it may depend on the usage downstream - at the moment we are just taking sqrt and doing some multiplication, that will take it to equal 0 but in this use case it should be fine I think? eps contribution would be removed in these cases, but not sure if that's relevant

hstojic · 2025-01-06T14:20:20Z

trieste/models/gpflow/sampler.py

@@ -133,7 +134,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
        tf.debugging.assert_greater_equal(jitter, 0.0)

        mean, var = self._model.predict(at[..., None, :, :])  # [..., 1, 1, L], [..., 1, 1, L]
-        var = var + jitter
+        var = ensure_positive(var + jitter)


can you also please check GPflux and keras samplers?
in Keras (https://github.com/secondmind-labs/trieste/blob/25d2a038fc1a74485337afac4fa45f29a4c4a311/trieste/models/keras/sampler.py#L171C59-L171C67), we have the same use-case and we should use the new ensure_positive function there as well

…itter

hstojic

done

Uri Granta added 4 commits January 3, 2025 10:23

Cap reparam sampling jitter

8ef3e46

minimum not maximum

e284c87

Add a few simple tests

7fd07e6

Oops, misunderstood point. Try again

ff62b0c

uri-granta changed the title ~~Cap reparam sampling jitter~~ Remove default reparam jitter but ensure positive variance Jan 3, 2025

Uri Granta added 2 commits January 5, 2025 14:28

Float, don't int

7ef0492

Add tests

97900aa

uri-granta marked this pull request as ready for review January 5, 2025 21:49

Uri Granta added 2 commits January 6, 2025 09:36

Leave other samplers alone

51adc46

Really leave them alone

bfccfd8

uri-granta commented Jan 6, 2025

View reviewed changes

uri-granta changed the title ~~Remove default reparam jitter but ensure positive variance~~ Remove default independent sampler jitter but ensure positive variance Jan 6, 2025

vpicheny reviewed Jan 6, 2025

View reviewed changes

Assert model variance is zero in test

0555a84

hstojic reviewed Jan 6, 2025

View reviewed changes

Uri Granta added 3 commits January 21, 2025 09:29

Merge remote-tracking branch 'origin/develop' into uri/cap_sampling_j…

b26fb7a

…itter

Adress review comments

c977e6f

Remove outdated test

6be2bd8

hstojic approved these changes Jan 21, 2025

View reviewed changes

uri-granta merged commit 2365c50 into develop Jan 21, 2025
12 checks passed

uri-granta deleted the uri/cap_sampling_jitter branch January 21, 2025 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove default independent sampler jitter but ensure positive variance #888

Remove default independent sampler jitter but ensure positive variance #888

uri-granta commented Jan 3, 2025 •

edited

Loading

uri-granta Jan 6, 2025

vpicheny Jan 6, 2025

uri-granta Jan 6, 2025

hstojic Jan 6, 2025

hstojic Jan 6, 2025

vpicheny Jan 6, 2025

uri-granta Jan 7, 2025

hstojic Jan 17, 2025

hstojic Jan 17, 2025

uri-granta Jan 20, 2025

vpicheny Jan 6, 2025

uri-granta Jan 6, 2025

vpicheny Jan 6, 2025

uri-granta Jan 6, 2025

hstojic Jan 6, 2025

vpicheny Jan 6, 2025

hstojic Jan 17, 2025

uri-granta Jan 20, 2025

vpicheny left a comment

hstojic left a comment

hstojic Jan 6, 2025

hstojic Jan 6, 2025

hstojic Jan 6, 2025

hstojic Jan 6, 2025

hstojic left a comment

Remove default independent sampler jitter but ensure positive variance #888

Remove default independent sampler jitter but ensure positive variance #888

Conversation

uri-granta commented Jan 3, 2025 • edited Loading

Summary

PR checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpicheny left a comment

Choose a reason for hiding this comment

hstojic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hstojic left a comment

Choose a reason for hiding this comment

uri-granta commented Jan 3, 2025 •

edited

Loading