-
-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bayesian Optimisation to AB testing example #1250
Conversation
@ae-foster to get CI tests to run, you'll need to merge dev into your branch and then push. |
d746259
to
c08851a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some arbitrary numerics decisions in here. Also would be nice to make the GP-BO and iter_sample
more "pyronic"
examples/contrib/oed/gp_bayes_opt.py
Outdated
# transform x to an unconstrained domain | ||
unconstrained_x_init = transform_to(self.constraints).inv(x_init) | ||
unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True) | ||
minimizer = optim.LBFGS([unconstrained_x], max_eval=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cap evals at 20
to avoid some numerical instability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a remind comment here to use LBFGS with line_search when pytorch/pytorch#8824 is reviewed and merged? It might be more stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :) I didn't realize that was in the pipeline
|
||
def closure(): | ||
minimizer.zero_grad() | ||
if (torch.log(torch.abs(unconstrained_x)) > 25.).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat arbitrary cutoff- necessary to prevent overflow during LBFGS run
examples/contrib/oed/gp_bayes_opt.py
Outdated
# ax1.set_xlim(-1, 101) | ||
# ax1.set_title("Find {}".format(xlabel)) | ||
# if with_title: | ||
# ax1.set_ylabel("Gaussian Process Regression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code is unsightly but useful for debugging. test_examples
will not admit matplotlib
in example code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to use local imports for plotting dependencies:
def plot(self, gs, ...):
from matplotlib import pyplot as plt
...
That's not perfect, but it's better than commented-out code 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably just add a separate test for this and remove the test code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the code, keep it on a local branch
# No noise, just jitter for numerical stability | ||
Kffnew[N, N] = end + self.jitter | ||
# Heuristic to avoid adding degenerate points | ||
if Kffnew.logdet() > -15.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrary cutoff: prevent Kff
becoming singular.
pyro/contrib/gp/models/gpr.py
Outdated
# Todo use pyro.sample | ||
d = normal.Normal(torch.tensor(0.), torch.tensor(1.)) | ||
# Reparametrize explicitly - aids autograd | ||
ynew = (loc + self.mean_function(xnew)) + d.sample()*cov.sqrt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to make this a true pyronic sampler if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you write torch.distributions.Normal(loc + self.mean_function(xnew), cov.sqrt()).rsample()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, had it in my mind that that didn't work- it does
examples/contrib/oed/gp_bayes_opt.py
Outdated
X = torch.cat([self.gpmodel.X, X]) | ||
y = torch.cat([self.gpmodel.y, y]) | ||
self.gpmodel.set_data(X, y) | ||
self.gpmodel.optimize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very pyronic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do some GP magic to make these updates cheaper than recomputing the full posterior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes depends how much magic you want. Here https://github.com/uber/pyro/blob/dev/pyro/contrib/gp/models/gpr.py#L80 we could have cached Kff
and just add the kernel computations we need. One step further is to use the magic of Schur complements https://en.wikipedia.org/wiki/Schur_complement to avoid having to invert the whole kernel matrix. I actually started implementing a Schur complement method for iter_sample
but in the end the existing solution works and reuses more existing code. In either case, we would have to refactor internals of GPRegression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open a separate issue about this, it seems like generally useful functionality for contrib.gp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is not expensive for Bayesian Optimization because num_data in BO is small. When num_data
is large, I guess the most expensive operator is Cholesky(Kff).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other important factor is how you plan to add points to the GP- all at once or drip by drip. In the all-at-once setting, the current code is optimal. You want to avoid explicitly inverting Kff
and so using Cholesky plus trtrs
is the best. On the other hand, if you are going drip by drip, you can invert the small kernel matrices formed from the new points, and then get the overall precision matrix using Schur complements. But yh, I'll open an issue about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I think the algorithmic content is pretty solid, most of my comments are about code organization.
pyro/contrib/gp/models/gpr.py
Outdated
# Todo use pyro.sample | ||
d = normal.Normal(torch.tensor(0.), torch.tensor(1.)) | ||
# Reparametrize explicitly - aids autograd | ||
ynew = (loc + self.mean_function(xnew)) + d.sample()*cov.sqrt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you write torch.distributions.Normal(loc + self.mean_function(xnew), cov.sqrt()).rsample()
?
pyro/infer/trace_elbo.py
Outdated
@@ -115,7 +115,7 @@ def loss_and_grads(self, model, guide, *args, **kwargs): | |||
|
|||
if trainable_params and getattr(surrogate_elbo_particle, 'requires_grad', False): | |||
surrogate_loss_particle = -surrogate_elbo_particle / self.num_particles | |||
surrogate_loss_particle.backward() | |||
surrogate_loss_particle.backward(retain_graph=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has some performance and memory implications. I'll push a fix today as part of #1227 that exposes a differentiable_loss
function that you can use when you need retain_graph=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kk I'll rebase from dev
once that is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in #1252
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this change and switch to using TraceEnum_ELBO
below
pyro/contrib/gp/models/gpr.py
Outdated
:rtype: function | ||
""" | ||
# Make these visible in the inner function | ||
global X, y, Kff, N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using global
here and below, how about making sample_next
take these variables as arguments and returning a curried function with functools.partial
?
def sample_next(X, y, Kff, N, xnew):
...
return functools.partial(sample_next, X, y, Kff, xnew)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't do that was that we change X
inside the inner function (by writing X = Xnew
). I thought scoping rules would mean that that change wasn't saved, but honestly I didn't actually try it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will LBFGS add many samples to the globals "X", "Y" during its optimization for sampler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will, but I passed max_eval=20
to cap this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also put these variables into a dictionary and mutate that instead of using global
examples/contrib/oed/gp_bayes_opt.py
Outdated
X = torch.cat([self.gpmodel.X, X]) | ||
y = torch.cat([self.gpmodel.y, y]) | ||
self.gpmodel.set_data(X, y) | ||
self.gpmodel.optimize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do some GP magic to make these updates cheaper than recomputing the full posterior?
examples/contrib/oed/gp_bayes_opt.py
Outdated
:rtype: torch.Tensor | ||
""" | ||
|
||
if method == "Thompson": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's split this out into an acquisition function parameter instead of implementing acquisition functions inside acquire
, so it's easier to experiment with different Monte Carlo acquisition functions.
examples/contrib/oed/gp_bayes_opt.py
Outdated
from torch.distributions import transform_to | ||
|
||
|
||
class GPBayesOptimizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to refactor this a bit into a pyro.optim.multi.MultiOptimizer
so that it looks more like other Pyro optimizers. That way when we experiment with other optimizers e.g. gradient-based optimizers it'll be easy to swap them out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're best to talk over this together- we need to implement a step
function which would replace my run
function. Would be also need to refactor the example to use pyro.param
? Right now, it's not really true that after a step
, the params are updated to new near-optimal values. But obviously I could add self.opt_differentiable(lambda x: self.gpmodel(x)[0])
into each step (minimize the current GP mean function)
examples/contrib/oed/gp_bayes_opt.py
Outdated
# ax1.set_xlim(-1, 101) | ||
# ax1.set_title("Find {}".format(xlabel)) | ||
# if with_title: | ||
# ax1.set_ylabel("Gaussian Process Regression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably just add a separate test for this and remove the test code here
examples/contrib/oed/gp_bayes_opt.py
Outdated
|
||
class GPBayesOptimizer: | ||
"""Performs Bayesian Optimization using a Gaussian Process as an | ||
emulator for the unknown function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the code is a little more settled down, you should expand this docstring and add a couple usage examples
examples/contrib/oed/gp_bayes_opt.py
Outdated
@@ -0,0 +1,164 @@ | |||
# import matplotlib.gridspec as gridspec | |||
# import matplotlib.pyplot as plt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these as discussed below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ae-foster Am I understand correctly that LBFGS will add many samples to the globals "X", "Y" during its optimization for sampler
and this is how Thomson sampling works?
examples/contrib/oed/gp_bayes_opt.py
Outdated
X = torch.cat([self.gpmodel.X, X]) | ||
y = torch.cat([self.gpmodel.y, y]) | ||
self.gpmodel.set_data(X, y) | ||
self.gpmodel.optimize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is not expensive for Bayesian Optimization because num_data in BO is small. When num_data
is large, I guess the most expensive operator is Cholesky(Kff).
examples/contrib/oed/gp_bayes_opt.py
Outdated
# transform x to an unconstrained domain | ||
unconstrained_x_init = transform_to(self.constraints).inv(x_init) | ||
unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True) | ||
minimizer = optim.LBFGS([unconstrained_x], max_eval=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a remind comment here to use LBFGS with line_search when pytorch/pytorch#8824 is reviewed and merged? It might be more stable.
examples/contrib/oed/gp_bayes_opt.py
Outdated
:rtype: tuple | ||
""" | ||
|
||
x_init = self.gpmodel.X[-1:].new_empty(1).uniform_(self.constraints.lower_bound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for X[-1:]
here, just X
is enough
examples/contrib/oed/gp_bayes_opt.py
Outdated
candidates = [] | ||
values = [] | ||
for j in range(num_candidates): | ||
x, y = self.find_a_candidate(differentiable, x_init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use different init points for each candidate?
examples/contrib/oed/gp_bayes_opt.py
Outdated
""" | ||
|
||
# Initialize the return tensor | ||
X = torch.zeros(num_acquisitions, *self.gpmodel.X.shape[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is better to use self.gpmodel.new_empty
here
examples/contrib/oed/gp_bayes_opt.py
Outdated
""" | ||
# transform x to an unconstrained domain | ||
unconstrained_x_init = transform_to(self.constraints).inv(x_init) | ||
unconstrained_x = torch.tensor(unconstrained_x_init, requires_grad=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use new_tensor
here
pyro/contrib/gp/models/gpr.py
Outdated
:rtype: function | ||
""" | ||
# Make these visible in the inner function | ||
global X, y, Kff, N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will LBFGS add many samples to the globals "X", "Y" during its optimization for sampler
?
pyro/contrib/gp/models/gpr.py
Outdated
conditioning on previously sampled values. | ||
""" | ||
if torch.isnan(xnew).any(): | ||
raise ValueError("Cannot evaluate GP at value: {}".format(xnew)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might want to use warn_if_nan
https://github.com/uber/pyro/blob/dev/pyro/util.py#L49
examples/contrib/oed/ab_test.py
Outdated
vi_parameters={ | ||
"guide": guide, | ||
"optim": optim.Adam({"lr": 0.0025}), | ||
"loss": Trace_ELBO(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use "loss": TraceEnum_ELBO(strict_enumeration_warning=False).differentiable_loss
here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not reviewed the Thompson sampling algorithm because I am not familiar with it - but overall LGTM
2972c32
to
059832a
Compare
We'll need #1252 merged for the CI to pass (currently getting the pytorch error on example). After that I think we're good to go |
059832a
to
0fffde8
Compare
@eb8680 have the CI passing now :) |
* Add Bayesian optimisation to AB test example * Fix bug- cannot mean-centre like that * Move GP optim to separate file, add batch acquisition * Thompson sampling - messy version * Tidy up BO for OED * Lint * Remove comments * Run both estimated and true APE in example * Clean up from rebase * Cap LBFGS max_evals, some numerics magic for benefit of example * Renamed arg * Remove commented code * Don't need to manually reparametrise * Don't offer keyword arguments to acquire, call it acquire_thompson instead * Address further review comments * Make GPBayesOptimizer a MultiOptimizer; warn_if_nan * Remove global keyword, replace with dict * Put retain_graph=True in svi * GP uses right loss
To complete our first OED example, I've added a GP-based Bayesian optimizer based on http://pyro.ai/examples/bo.html . This uses Thompson sampling to acquire new data in batches.
Summary of changes:
ab_test.py
- optimizing the APE usingGPBayesOptimizer
gp_bayes_opt.py
- new code implementing the BO example with Thompson sampling: probably need to move this somewhere else or can we merge it with existing BO code?gpr.py
- added a method to lazily create a multivariate sample from a Gaussian process posterior