Add Bayesian Optimisation to AB testing example #1250

ae-foster · 2018-07-20T01:47:40Z

To complete our first OED example, I've added a GP-based Bayesian optimizer based on http://pyro.ai/examples/bo.html . This uses Thompson sampling to acquire new data in batches.

Summary of changes:

ab_test.py - optimizing the APE using GPBayesOptimizer
gp_bayes_opt.py - new code implementing the BO example with Thompson sampling: probably need to move this somewhere else or can we merge it with existing BO code?
gpr.py - added a method to lazily create a multivariate sample from a Gaussian process posterior

fritzo · 2018-07-20T16:32:32Z

@ae-foster to get CI tests to run, you'll need to merge dev into your branch and then push.

ae-foster

Some arbitrary numerics decisions in here. Also would be nice to make the GP-BO and iter_sample more "pyronic"

ae-foster · 2018-07-20T21:50:13Z

examples/contrib/oed/gp_bayes_opt.py

+        # transform x to an unconstrained domain
+        unconstrained_x_init = transform_to(self.constraints).inv(x_init)
+        unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True)
+        minimizer = optim.LBFGS([unconstrained_x], max_eval=20)


Cap evals at 20 to avoid some numerical instability

Should we add a remind comment here to use LBFGS with line_search when pytorch/pytorch#8824 is reviewed and merged? It might be more stable.

Sure :) I didn't realize that was in the pipeline

ae-foster · 2018-07-20T21:51:13Z

examples/contrib/oed/gp_bayes_opt.py

+
+        def closure():
+            minimizer.zero_grad()
+            if (torch.log(torch.abs(unconstrained_x)) > 25.).any():


Somewhat arbitrary cutoff- necessary to prevent overflow during LBFGS run

ae-foster · 2018-07-20T21:52:22Z

examples/contrib/oed/gp_bayes_opt.py

+    #     ax1.set_xlim(-1, 101)
+    #     ax1.set_title("Find {}".format(xlabel))
+    #     if with_title:
+    #         ax1.set_ylabel("Gaussian Process Regression")


Commented code is unsightly but useful for debugging. test_examples will not admit matplotlib in example code.

Feel free to use local imports for plotting dependencies:

def plot(self, gs, ...): from matplotlib import pyplot as plt ...

That's not perfect, but it's better than commented-out code 😉

You should probably just add a separate test for this and remove the test code here

I'll remove the code, keep it on a local branch

ae-foster · 2018-07-20T21:53:30Z

pyro/contrib/gp/models/gpr.py

+            # No noise, just jitter for numerical stability
+            Kffnew[N, N] = end + self.jitter
+            # Heuristic to avoid adding degenerate points
+            if Kffnew.logdet() > -15.:


Arbitrary cutoff: prevent Kff becoming singular.

ae-foster · 2018-07-20T21:57:52Z

pyro/contrib/gp/models/gpr.py

+            # Todo use pyro.sample
+            d = normal.Normal(torch.tensor(0.), torch.tensor(1.))
+            # Reparametrize explicitly - aids autograd
+            ynew = (loc + self.mean_function(xnew)) + d.sample()*cov.sqrt()


It would be nice to make this a true pyronic sampler if possible

Why can't you write torch.distributions.Normal(loc + self.mean_function(xnew), cov.sqrt()).rsample()?

Yes, had it in my mind that that didn't work- it does

ae-foster · 2018-07-20T22:01:08Z

examples/contrib/oed/gp_bayes_opt.py

+        X = torch.cat([self.gpmodel.X, X])
+        y = torch.cat([self.gpmodel.y, y])
+        self.gpmodel.set_data(X, y)
+        self.gpmodel.optimize()


Not very pyronic

Can we do some GP magic to make these updates cheaper than recomputing the full posterior?

Yes depends how much magic you want. Here https://github.com/uber/pyro/blob/dev/pyro/contrib/gp/models/gpr.py#L80 we could have cached Kff and just add the kernel computations we need. One step further is to use the magic of Schur complements https://en.wikipedia.org/wiki/Schur_complement to avoid having to invert the whole kernel matrix. I actually started implementing a Schur complement method for iter_sample but in the end the existing solution works and reuses more existing code. In either case, we would have to refactor internals of GPRegression

Let's open a separate issue about this, it seems like generally useful functionality for contrib.gp

I guess it is not expensive for Bayesian Optimization because num_data in BO is small. When num_data is large, I guess the most expensive operator is Cholesky(Kff).

I think the other important factor is how you plan to add points to the GP- all at once or drip by drip. In the all-at-once setting, the current code is optimal. You want to avoid explicitly inverting Kff and so using Cholesky plus trtrs is the best. On the other hand, if you are going drip by drip, you can invert the small kernel matrices formed from the new points, and then get the overall precision matrix using Schur complements. But yh, I'll open an issue about this

eb8680

Nice work! I think the algorithmic content is pretty solid, most of my comments are about code organization.

eb8680 · 2018-07-20T22:23:03Z

pyro/contrib/gp/models/gpr.py

+            # Todo use pyro.sample
+            d = normal.Normal(torch.tensor(0.), torch.tensor(1.))
+            # Reparametrize explicitly - aids autograd
+            ynew = (loc + self.mean_function(xnew)) + d.sample()*cov.sqrt()


Why can't you write torch.distributions.Normal(loc + self.mean_function(xnew), cov.sqrt()).rsample()?

eb8680 · 2018-07-20T22:25:46Z

pyro/infer/trace_elbo.py

@@ -115,7 +115,7 @@ def loss_and_grads(self, model, guide, *args, **kwargs):

            if trainable_params and getattr(surrogate_elbo_particle, 'requires_grad', False):
                surrogate_loss_particle = -surrogate_elbo_particle / self.num_particles
-                surrogate_loss_particle.backward()
+                surrogate_loss_particle.backward(retain_graph=True)


This change has some performance and memory implications. I'll push a fix today as part of #1227 that exposes a differentiable_loss function that you can use when you need retain_graph=True

Kk I'll rebase from dev once that is merged

Added in #1252

Let's remove this change and switch to using TraceEnum_ELBO below

eb8680 · 2018-07-20T22:28:52Z

pyro/contrib/gp/models/gpr.py

+        :rtype: function
+        """
+        # Make these visible in the inner function
+        global X, y, Kff, N


Instead of using global here and below, how about making sample_next take these variables as arguments and returning a curried function with functools.partial?

def sample_next(X, y, Kff, N, xnew): ... return functools.partial(sample_next, X, y, Kff, xnew)

The reason I didn't do that was that we change X inside the inner function (by writing X = Xnew). I thought scoping rules would mean that that change wasn't saved, but honestly I didn't actually try it

Will LBFGS add many samples to the globals "X", "Y" during its optimization for sampler?

Yes it will, but I passed max_eval=20 to cap this.

You can also put these variables into a dictionary and mutate that instead of using global

eb8680 · 2018-07-20T22:32:47Z

examples/contrib/oed/gp_bayes_opt.py

+        X = torch.cat([self.gpmodel.X, X])
+        y = torch.cat([self.gpmodel.y, y])
+        self.gpmodel.set_data(X, y)
+        self.gpmodel.optimize()


Can we do some GP magic to make these updates cheaper than recomputing the full posterior?

eb8680 · 2018-07-20T22:35:30Z

examples/contrib/oed/gp_bayes_opt.py

+        :rtype: torch.Tensor
+        """
+
+        if method == "Thompson":


Let's split this out into an acquisition function parameter instead of implementing acquisition functions inside acquire, so it's easier to experiment with different Monte Carlo acquisition functions.

eb8680 · 2018-07-20T22:38:14Z

examples/contrib/oed/gp_bayes_opt.py

+from torch.distributions import transform_to
+
+
+class GPBayesOptimizer:


We should be able to refactor this a bit into a pyro.optim.multi.MultiOptimizer so that it looks more like other Pyro optimizers. That way when we experiment with other optimizers e.g. gradient-based optimizers it'll be easy to swap them out

I think we're best to talk over this together- we need to implement a step function which would replace my run function. Would be also need to refactor the example to use pyro.param? Right now, it's not really true that after a step, the params are updated to new near-optimal values. But obviously I could add self.opt_differentiable(lambda x: self.gpmodel(x)[0]) into each step (minimize the current GP mean function)

eb8680 · 2018-07-20T22:42:17Z

examples/contrib/oed/gp_bayes_opt.py

+    #     ax1.set_xlim(-1, 101)
+    #     ax1.set_title("Find {}".format(xlabel))
+    #     if with_title:
+    #         ax1.set_ylabel("Gaussian Process Regression")


You should probably just add a separate test for this and remove the test code here

eb8680 · 2018-07-20T22:43:34Z

examples/contrib/oed/gp_bayes_opt.py

+
+class GPBayesOptimizer:
+    """Performs Bayesian Optimization using a Gaussian Process as an
+    emulator for the unknown function.


Once the code is a little more settled down, you should expand this docstring and add a couple usage examples

eb8680 · 2018-07-20T22:46:03Z

examples/contrib/oed/gp_bayes_opt.py

@@ -0,0 +1,164 @@
+# import matplotlib.gridspec as gridspec
+# import matplotlib.pyplot as plt


Remove these as discussed below

fehiepsi

@ae-foster Am I understand correctly that LBFGS will add many samples to the globals "X", "Y" during its optimization for sampler and this is how Thomson sampling works?

fehiepsi · 2018-07-21T02:10:32Z

examples/contrib/oed/gp_bayes_opt.py

+        X = torch.cat([self.gpmodel.X, X])
+        y = torch.cat([self.gpmodel.y, y])
+        self.gpmodel.set_data(X, y)
+        self.gpmodel.optimize()


I guess it is not expensive for Bayesian Optimization because num_data in BO is small. When num_data is large, I guess the most expensive operator is Cholesky(Kff).

fehiepsi · 2018-07-21T02:13:36Z

examples/contrib/oed/gp_bayes_opt.py

+        # transform x to an unconstrained domain
+        unconstrained_x_init = transform_to(self.constraints).inv(x_init)
+        unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True)
+        minimizer = optim.LBFGS([unconstrained_x], max_eval=20)


Should we add a remind comment here to use LBFGS with line_search when pytorch/pytorch#8824 is reviewed and merged? It might be more stable.

fehiepsi · 2018-07-21T03:36:44Z

examples/contrib/oed/gp_bayes_opt.py

+        :rtype: tuple
+        """
+
+        x_init = self.gpmodel.X[-1:].new_empty(1).uniform_(self.constraints.lower_bound,


nit: no need for X[-1:] here, just X is enough

fehiepsi · 2018-07-21T03:37:28Z

examples/contrib/oed/gp_bayes_opt.py

+        candidates = []
+        values = []
+        for j in range(num_candidates):
+            x, y = self.find_a_candidate(differentiable, x_init)


Should we use different init points for each candidate?

fehiepsi · 2018-07-21T03:53:40Z

examples/contrib/oed/gp_bayes_opt.py

+        """
+
+        # Initialize the return tensor
+        X = torch.zeros(num_acquisitions, *self.gpmodel.X.shape[1:])


nit: it is better to use self.gpmodel.new_empty here

fehiepsi · 2018-07-21T03:54:31Z

examples/contrib/oed/gp_bayes_opt.py

+        """
+        # transform x to an unconstrained domain
+        unconstrained_x_init = transform_to(self.constraints).inv(x_init)
+        unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True)


nit: use new_tensor here

fehiepsi · 2018-07-21T06:19:08Z

pyro/contrib/gp/models/gpr.py

+        :rtype: function
+        """
+        # Make these visible in the inner function
+        global X, y, Kff, N


Will LBFGS add many samples to the globals "X", "Y" during its optimization for sampler?

fehiepsi · 2018-07-24T01:25:36Z

pyro/contrib/gp/models/gpr.py

+            conditioning on previously sampled values.
+            """
+            if torch.isnan(xnew).any():
+                raise ValueError("Cannot evaluate GP at value: {}".format(xnew))


nit: you might want to use warn_if_nan https://github.com/uber/pyro/blob/dev/pyro/util.py#L49

eb8680 · 2018-07-24T01:27:49Z

examples/contrib/oed/ab_test.py

+            vi_parameters={
+                "guide": guide,
+                "optim": optim.Adam({"lr": 0.0025}),
+                "loss": Trace_ELBO(),


Let's just use "loss": TraceEnum_ELBO(strict_enumeration_warning=False).differentiable_loss here instead

fehiepsi

I have not reviewed the Thompson sampling algorithm because I am not familiar with it - but overall LGTM

ae-foster · 2018-07-24T21:27:20Z

We'll need #1252 merged for the CI to pass (currently getting the pytorch error on example). After that I think we're good to go

…stead

ae-foster · 2018-07-25T19:05:22Z

@eb8680 have the CI passing now :)

* Add Bayesian optimisation to AB test example * Fix bug- cannot mean-centre like that * Move GP optim to separate file, add batch acquisition * Thompson sampling - messy version * Tidy up BO for OED * Lint * Remove comments * Run both estimated and true APE in example * Clean up from rebase * Cap LBFGS max_evals, some numerics magic for benefit of example * Renamed arg * Remove commented code * Don't need to manually reparametrise * Don't offer keyword arguments to acquire, call it acquire_thompson instead * Address further review comments * Make GPBayesOptimizer a MultiOptimizer; warn_if_nan * Remove global keyword, replace with dict * Put retain_graph=True in svi * GP uses right loss

ae-foster added the WIP label Jul 20, 2018

ae-foster requested review from eb8680 and fehiepsi July 20, 2018 01:48

ae-foster force-pushed the ab-test-bayes-opt branch from d746259 to c08851a Compare July 20, 2018 18:26

ae-foster commented Jul 20, 2018

View reviewed changes

ae-foster added the awaiting review label Jul 20, 2018

eb8680 reviewed Jul 20, 2018

View reviewed changes

eb8680 added awaiting response and removed awaiting review labels Jul 20, 2018

eb8680 mentioned this pull request Jul 20, 2018

Add differentiable_loss to Trace_ELBO #1252

Merged

fehiepsi reviewed Jul 21, 2018

View reviewed changes

ae-foster added awaiting review and removed awaiting response labels Jul 23, 2018

ae-foster mentioned this pull request Jul 23, 2018

Efficient posterior update and posterior sampling in Gaussian processes #1258

Open

fehiepsi reviewed Jul 24, 2018

View reviewed changes

eb8680 reviewed Jul 24, 2018

View reviewed changes

fehiepsi previously approved these changes Jul 24, 2018

View reviewed changes

ae-foster dismissed fehiepsi’s stale review via 2972c32 July 24, 2018 21:18

ae-foster force-pushed the ab-test-bayes-opt branch from 2972c32 to 059832a Compare July 24, 2018 21:25

ae-foster removed WIP awaiting review labels Jul 24, 2018

Adam Foster added 7 commits July 24, 2018 18:44

Add Bayesian optimisation to AB test example

02246f6

Fix bug- cannot mean-centre like that

94ac5b7

Move GP optim to separate file, add batch acquisition

52c855f

Thompson sampling - messy version

8feedea

Tidy up BO for OED

523362b

Lint

f83e263

Remove comments

6512ace

Adam Foster added 10 commits July 24, 2018 18:44

Run both estimated and true APE in example

fcd1f33

Clean up from rebase

69c3765

Cap LBFGS max_evals, some numerics magic for benefit of example

696b946

Renamed arg

f40aead

Remove commented code

e10b884

Don't need to manually reparametrise

e2cb30e

Don't offer keyword arguments to acquire, call it acquire_thompson in…

4d4d56e

…stead

Address further review comments

68488de

Make GPBayesOptimizer a MultiOptimizer; warn_if_nan

7f35d3d

Remove global keyword, replace with dict

0fffde8

ae-foster force-pushed the ab-test-bayes-opt branch from 059832a to 0fffde8 Compare July 25, 2018 01:45

ae-foster and others added 2 commits July 24, 2018 19:49

Put retain_graph=True in svi

ea6db69

GP uses right loss

b7d76ef

eb8680 approved these changes Jul 25, 2018

View reviewed changes

eb8680 merged commit b99cdf4 into dev Jul 25, 2018

eb8680 deleted the ab-test-bayes-opt branch July 25, 2018 20:56

		from torch.distributions import transform_to


		class GPBayesOptimizer:

		@@ -0,0 +1,164 @@
		# import matplotlib.gridspec as gridspec
		# import matplotlib.pyplot as plt

Add Bayesian Optimisation to AB testing example #1250

Add Bayesian Optimisation to AB testing example #1250

Conversation

ae-foster commented Jul 20, 2018 • edited Loading

fritzo commented Jul 20, 2018

ae-foster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fehiepsi Jul 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eb8680 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fehiepsi left a comment

Choose a reason for hiding this comment

fehiepsi Jul 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fehiepsi left a comment

Choose a reason for hiding this comment

ae-foster commented Jul 24, 2018

ae-foster commented Jul 25, 2018

ae-foster commented Jul 20, 2018 •

edited

Loading

fehiepsi Jul 21, 2018 •

edited

Loading

fehiepsi Jul 21, 2018 •

edited

Loading