-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update blackbox likelihood example #631
Update blackbox likelihood example #631
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
03ec340
to
26d2f9d
Compare
- Make Op more generic (accepts distinct inputs, returns vectorized loglikelihood), for better transfer learning - Build Ops more iteratively, and show useful debugging strategies - Use CustomDist instead of Potential (added a short section at the end showing Potential) - Actually make use of finite differences as mentioned in the text - Remove mention of old Cython example and reference new JAX example - Fix some cross-references (unclear success)
26d2f9d
to
ba8d56d
Compare
Is the pre-commit.ci failure to build known? |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:25Z One problem I have with this framing is that it blurs a bit the purpose of this tutorial. Are we learning how to implement a likelihood (this is what the name suggests), or are we learning how to wrap arbitrary Pytensor Ops, a using the specific example of a likelihood?
I would prefer the framing to of the introduction to be very squarely on automatic logp inference,
I'd like to see a separate tutorial called "Pytensor for PyMC Users" that covers how to wrap arbitrary python code for Pytensor use, then have this one focus more explicitly on the difference between
To help most users, I think this tutorial should be a flowchart like: Can you write down the generative process? --> ricardoV94 commented on 2024-02-14T11:20:41Z I think we need a proper notebook on CustomDist with dist behavior, probably a core notebook, not even pymc-examples.
This tutorial has always been about a black-box likelihood, original a complicated cython function, that got removed in the last hackathon because it was too complicated for devs to maintain. Now we have this baby numpy alternative here, which I think it's fine.
The different between a blackbox Likelihood/model and individual Ops is thin. I don't want to delve into that here because it requires more PyTensor exposition. I just wanted to make this NB more useful than it was before not to change the pedagogy. ricardoV94 commented on 2024-02-14T11:35:14Z May be good to emphasize that we could black box the model part only, and still use a |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:26Z Line #10. raise TypeError(f"Invalid input type to loglike: {type(param)}") If we don't like my idea to split the example, we should find a less contrived example imo ricardoV94 commented on 2024-02-14T11:28:05Z I think contrived example is fine. Before the example failed because the parameters were passed as a vector and there was an indexing operation. Only a sufficiently complex example will be non-contrived, and that will be necessarily less readable. In the JAX example I did try this by writting a HMM with jax.scan and making use of vmap (of course you can do it in PyTensor, that was before
There was already pushback there that the example was too complex and detracted from the point.
https://www.pymc.io/projects/examples/en/latest/howto/wrapping_jax_function.html |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:27Z Can we write out "gradients"? And link to some docs about NUTS for why it matters? Trying to think about new users coming into this. ricardoV94 commented on 2024-02-14T11:28:28Z Yeah, definitely ricardoV94 commented on 2024-03-13T11:15:09Z Is the part mentioned earlier not enough?
Another issue is that if you want to be able to use the gradient-based step samplers like {term}NUTS and {term}Hamiltonian Monte Carlo, then your model/likelihood needs a gradient to be defined. If you have a model that is defined as a set of PyTensor operators then this is no problem - internally it will be able to do automatic differentiation - but if your model is essentially a "black box" then you won't necessarily know what the gradients are. |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:28Z Line #5. def make_node(self, m, c, sigma, x, data): Type annotations would be helpful I think, and it would help to highlight the difference between ricardoV94 commented on 2024-02-14T11:28:40Z Agree ricardoV94 commented on 2024-03-13T11:19:38Z On revisiting this, note sure we can do much with the
|
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:28Z Line #12. # use a Potential to "call" the Op and include it in the logp computation
ricardoV94 commented on 2024-03-13T11:25:22Z Fixed |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:29Z "if we ask the model to compute gradients of the log-likelihood"? I don't think it's immediately obvious what ricardoV94 commented on 2024-03-13T11:26:42Z Fixed |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:30Z I am really really hesitant to use finite difference gradients in an example, because it suggests that this is a good idea. I think it would be better to show the pencil pushing to get analytic gradients. ricardoV94 commented on 2024-02-14T11:33:08Z I know where you are coming from I also see users trying to get out of PyTensor/math too soon.
Still for simplicity sake I would keep it. I think the example with the closed form solution was less informative. Also finite differences are a great gradient debugging tool (and what verify_grad does) so it's nice to show it. I can emphasize that analytic expressions are usually preferable more strongly ricardoV94 commented on 2024-03-13T11:36:49Z Rephrased text to be less positive and added an explicit caution section |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:31Z 2nd link should go to https://en.wikipedia.org/wiki/Pushforward_(differential) instead of Jacobian perhaps ricardoV94 commented on 2024-02-14T11:35:58Z Thanks |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:32Z Line #43. pytensor.gradient.grad_not_implemented(self, 4, data), I actually like this, I've never seen the returns written out one at a time like this. But that's also a negative, because 99% of the time you'll just do g * outputs ricardoV94 commented on 2024-02-14T11:37:08Z That's a big part of what I wanted to show in the updated NB. Users always ask about multiple inputs / outputs when referring to this NB / attempting their own Op |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:32Z Line #12. # use a Potential to "call" the Op and include it in the logp computation
ricardoV94 commented on 2024-03-13T11:45:11Z Fixed |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2024-02-02T22:53:33Z The models as written so far wouldn't have any forward sampling capability either right? It might be nice to highlight that, the same way you highlighted that ricardoV94 commented on 2024-03-13T11:46:30Z Not sure if I want to emphasize that with an example. Feel it might distract from the main topic |
|
||
+++ | ||
|
||
Now, let's use this Op to repeat the example shown above. To do this let's create some data containing a straight line with additive Gaussian noise (with a mean of zero and a standard deviation of `sigma`). For simplicity we set {class}`~pymc.Uniform` prior distributions on the gradient and y-intercept. As we've not set the `grad()` method of the Op PyMC will not be able to use the gradient-based samplers, so will fall back to using the {class}`pymc.Slice` sampler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pymc.Slice isn't rendered as a link, but the reference is right here and should be left as is. The issue is on the PyMC api docs.
I have no idea, I haven't tracked the latest PRs and wasn't even aware we had it (or what it does).
Similar thinking on my end. If it really is blackbox, I think the recommendation should go more towards use samplers that don't require gradients (several of which are available in PyMC already so it doesn't complicate anything really). Here however it isn't so we can go ahead and also illustrate how to extend the Op with the grad method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some thoughts on NB review
I think we need a proper notebook on CustomDist with dist behavior, probably a core notebook, not even pymc-examples.
This tutorial has always been about a black-box likelihood, original a complicated cython function, that got removed in the last hackathon because it was too complicated for devs to maintain. Now we have this baby numpy alternative here, which I think it's fine.
The different between a blackbox Likelihood/model and individual Ops is thin. I don't want to delve into that here because it requires more PyTensor exposition. I just wanted to make this NB more useful than it was before not to change the pedagogy. View entire conversation on ReviewNB |
I think contrived example is fine. Before the example failed because the parameters were passed as a vector and there was an indexing operation. Only a sufficiently complex example will be non-contrived, and that will be necessarily less readable. In the JAX example I did try this by writting a HMM with jax.scan and making use of vmap (of course you can do it in PyTensor, that was before
There was already pushback there that the example was too complex and detracted from the point.
https://www.pymc.io/projects/examples/en/latest/howto/wrapping_jax_function.html View entire conversation on ReviewNB |
Yeah, definitely View entire conversation on ReviewNB |
Agree View entire conversation on ReviewNB |
I know where you are coming from I also see users trying to get out of PyTensor/math too soon.
Still for simplicity sake I would keep it. I think the example with the closed form solution was less informative. Also finite differences are a great gradient debugging tool (and what verify_grad does) so it's nice to show it. I can emphasize that analytic expressions are usually preferable more strongly View entire conversation on ReviewNB |
May be good to emphasize that we could black box the model part only, and still use a View entire conversation on ReviewNB |
Thanks View entire conversation on ReviewNB |
That's a big part of what I wanted to show in the updated NB. Users always ask about multiple inputs / outputs when referring to this NB / attempting their own Op View entire conversation on ReviewNB |
Is the part mentioned earlier not enough?
Another issue is that if you want to be able to use the gradient-based step samplers like {term} View entire conversation on ReviewNB |
On revisiting this, note sure we can do much with the
View entire conversation on ReviewNB |
Fixed View entire conversation on ReviewNB |
Fixed View entire conversation on ReviewNB |
Rephrased text to be less positive and added an explicit caution section View entire conversation on ReviewNB |
Fixed View entire conversation on ReviewNB |
Not sure if I want to emphasize that with an example. Feel it might distract from the main topic View entire conversation on ReviewNB |
6778109
to
1368976
Compare
Changes
Checklist
📚 Documentation preview 📚: https://pymc-examples--631.org.readthedocs.build/en/631/
Link to NB: https://pymcio--631.org.readthedocs.build/projects/examples/en/631/howto/blackbox_external_likelihood_numpy.html