Update blackbox likelihood example #631

ricardoV94 · 2024-02-02T13:22:44Z

Changes

Make Op more generic (accepts distinct inputs, returns vectorized loglikelihood), for better transfer learning
Build Ops more iteratively, and show useful debugging strategies
Use CustomDist instead of Potential (added a short section at the end showing Potential)
Actually make use of finite differences as mentioned in the text
Remove mention of old Cython example and reference new JAX example
Fix some cross-references (unclear success)

Checklist

Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
PR description contains a link to the relevant issue:
- a tracker one for existing notebooks (tracker issues have the "tracker id" label)
- or a proposal one for new notebooks
Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

📚 Documentation preview 📚: https://pymc-examples--631.org.readthedocs.build/en/631/

Link to NB: https://pymcio--631.org.readthedocs.build/projects/examples/en/631/howto/blackbox_external_likelihood_numpy.html

review-notebook-app · 2024-02-02T13:22:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

- Make Op more generic (accepts distinct inputs, returns vectorized loglikelihood), for better transfer learning - Build Ops more iteratively, and show useful debugging strategies - Use CustomDist instead of Potential (added a short section at the end showing Potential) - Actually make use of finite differences as mentioned in the text - Remove mention of old Cython example and reference new JAX example - Fix some cross-references (unclear success)

ricardoV94 · 2024-02-02T15:54:38Z

Is the pre-commit.ci failure to build known?

review-notebook-app · 2024-02-02T22:53:26Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:25Z
----------------------------------------------------------------

One problem I have with this framing is that it blurs a bit the purpose of this tutorial. Are we learning how to implement a likelihood (this is what the name suggests), or are we learning how to wrap arbitrary Pytensor Ops, a using the specific example of a likelihood?

I would prefer the framing to of the introduction to be very squarely on automatic logp inference, CustomDistand where you wouldn't want to use that. For example, when you know the logp of a model but not the generative process.

I'd like to see a separate tutorial called "Pytensor for PyMC Users" that covers how to wrap arbitrary python code for Pytensor use, then have this one focus more explicitly on the difference between pm.CustomDist and pm.Potential.

To help most users, I think this tutorial should be a flowchart like: Can you write down the generative process? --> CustomDist(dist=). Can you write down the logp --> Customdist(logp=) or pm.Potential.

ricardoV94 commented on 2024-02-14T11:20:41Z
----------------------------------------------------------------

I think we need a proper notebook on CustomDist with dist behavior, probably a core notebook, not even pymc-examples.

This tutorial has always been about a black-box likelihood, original a complicated cython function, that got removed in the last hackathon because it was too complicated for devs to maintain. Now we have this baby numpy alternative here, which I think it's fine.

The different between a blackbox Likelihood/model and individual Ops is thin. I don't want to delve into that here because it requires more PyTensor exposition. I just wanted to make this NB more useful than it was before not to change the pedagogy.

ricardoV94 commented on 2024-02-14T11:35:14Z
----------------------------------------------------------------

May be good to emphasize that we could black box the model part only, and still use a pm.Normal likelihood, or we can black-box both the model and likelihood (and even priors). The way the original example was written already lends itself to it, by separating my_model and my_loglike

review-notebook-app · 2024-02-02T22:53:27Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:26Z
----------------------------------------------------------------

Line #10.                raise TypeError(f"Invalid input type to loglike: {type(param)}")

If we don't like my idea to split the example, we should find a less contrived example imo

ricardoV94 commented on 2024-02-14T11:28:05Z
----------------------------------------------------------------

I think contrived example is fine. Before the example failed because the parameters were passed as a vector and there was an indexing operation. Only a sufficiently complex example will be non-contrived, and that will be necessarily less readable. In the JAX example I did try this by writting a HMM with jax.scan and making use of vmap (of course you can do it in PyTensor, that was before vectorize was added).

There was already pushback there that the example was too complex and detracted from the point.

https://www.pymc.io/projects/examples/en/latest/howto/wrapping_jax_function.html

review-notebook-app · 2024-02-02T22:53:27Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:27Z
----------------------------------------------------------------

Can we write out "gradients"? And link to some docs about NUTS for why it matters? Trying to think about new users coming into this.

ricardoV94 commented on 2024-02-14T11:28:28Z
----------------------------------------------------------------

Yeah, definitely

ricardoV94 commented on 2024-03-13T11:15:09Z
----------------------------------------------------------------

Is the part mentioned earlier not enough?

Another issue is that if you want to be able to use the gradient-based step samplers like

{term}NUTS and {term}Hamiltonian Monte Carlo, then your model/likelihood needs a gradient to be defined.

If you have a model that is defined as a set of PyTensor operators then this is no problem

- internally it will be able to do automatic differentiation

- but if your model is essentially a "black box" then you won't necessarily know what the gradients are.

review-notebook-app · 2024-02-02T22:53:28Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:28Z
----------------------------------------------------------------

Line #5.        def make_node(self, m, c, sigma, x, data):

Type annotations would be helpful I think, and it would help to highlight the difference between perform and make_node without tons of commentary, since one is doing numpy computation and one is doing symbolic stuff.

ricardoV94 commented on 2024-02-14T11:28:40Z
----------------------------------------------------------------

Agree

ricardoV94 commented on 2024-03-13T11:19:38Z
----------------------------------------------------------------

On revisiting this, note sure we can do much with the make_node type-hints. I wrote it to accept Numpy inputs as well as PyTensor objects (with the call to as_tensor), which I find makes life much easier for debugging. So I have only added the Apply as output type hint. perform is also odd because it doesn't return anything, but writes in place in the lists. Will type-hints to those anyway

review-notebook-app · 2024-02-02T22:53:29Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:28Z
----------------------------------------------------------------

Line #12.        # use a Potential to "call" the Op and include it in the logp computation

pm.CustomDist?

ricardoV94 commented on 2024-03-13T11:25:22Z
----------------------------------------------------------------

Fixed

review-notebook-app · 2024-02-02T22:53:30Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:29Z
----------------------------------------------------------------

"if we ask the model to compute gradients of the log-likelihood"? I don't think it's immediately obvious whatdlogp is

ricardoV94 commented on 2024-03-13T11:26:42Z
----------------------------------------------------------------

Fixed

review-notebook-app · 2024-02-02T22:53:31Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:30Z
----------------------------------------------------------------

I am really really hesitant to use finite difference gradients in an example, because it suggests that this is a good idea. I think it would be better to show the pencil pushing to get analytic gradients.

ricardoV94 commented on 2024-02-14T11:33:08Z
----------------------------------------------------------------

I know where you are coming from I also see users trying to get out of PyTensor/math too soon.

Still for simplicity sake I would keep it. I think the example with the closed form solution was less informative. Also finite differences are a great gradient debugging tool (and what verify_grad does) so it's nice to show it. I can emphasize that analytic expressions are usually preferable more strongly

ricardoV94 commented on 2024-03-13T11:36:49Z
----------------------------------------------------------------

Rephrased text to be less positive and added an explicit caution section

review-notebook-app · 2024-02-02T22:53:31Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:31Z
----------------------------------------------------------------

2nd link should go to https://en.wikipedia.org/wiki/Pushforward_(differential) instead of Jacobian perhaps

ricardoV94 commented on 2024-02-14T11:35:58Z
----------------------------------------------------------------

Thanks

review-notebook-app · 2024-02-02T22:53:32Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:32Z
----------------------------------------------------------------

Line #43.                pytensor.gradient.grad_not_implemented(self, 4, data),

I actually like this, I've never seen the returns written out one at a time like this. But that's also a negative, because 99% of the time you'll just do g * outputs

ricardoV94 commented on 2024-02-14T11:37:08Z
----------------------------------------------------------------

That's a big part of what I wanted to show in the updated NB. Users always ask about multiple inputs / outputs when referring to this NB / attempting their own Op

review-notebook-app · 2024-02-02T22:53:33Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:32Z
----------------------------------------------------------------

Line #12.        # use a Potential to "call" the Op and include it in the logp computation

pm.CustomDist

ricardoV94 commented on 2024-03-13T11:45:11Z
----------------------------------------------------------------

Fixed

review-notebook-app · 2024-02-02T22:53:34Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2024-02-02T22:53:33Z
----------------------------------------------------------------

The models as written so far wouldn't have any forward sampling capability either right? It might be nice to highlight that, the same way you highlighted that dlop didn't work without gradients implemented.

ricardoV94 commented on 2024-03-13T11:46:30Z
----------------------------------------------------------------

Not sure if I want to emphasize that with an example. Feel it might distract from the main topic

OriolAbril · 2024-02-06T22:41:41Z

examples/howto/blackbox_external_likelihood_numpy.myst.md

+
+++
+
+Now, let's use this Op to repeat the example shown above. To do this let's create some data containing a straight line with additive Gaussian noise (with a mean of zero and a standard deviation of `sigma`). For simplicity we set {class}`~pymc.Uniform` prior distributions on the gradient and y-intercept. As we've not set the `grad()` method of the Op PyMC will not be able to use the gradient-based samplers, so will fall back to using the {class}`pymc.Slice` sampler.


The pymc.Slice isn't rendered as a link, but the reference is right here and should be left as is. The issue is on the PyMC api docs.

examples/howto/blackbox_external_likelihood_numpy.myst.md

OriolAbril · 2024-02-06T22:53:17Z

Is the pre-commit.ci failure to build known?

I have no idea, I haven't tracked the latest PRs and wasn't even aware we had it (or what it does).

I am really really hesitant to use finite difference gradients in an example, because it suggests that this is a good idea. I think it would be better to show the pencil pushing to get analytic gradients.

Similar thinking on my end. If it really is blackbox, I think the recommendation should go more towards use samplers that don't require gradients (several of which are available in PyMC already so it doesn't complicate anything really). Here however it isn't so we can go ahead and also illustrate how to extend the Op with the grad method.

jessegrabowski

I added some thoughts on NB review

ricardoV94 · 2024-02-14T11:20:42Z

I think we need a proper notebook on CustomDist with dist behavior, probably a core notebook, not even pymc-examples.

This tutorial has always been about a black-box likelihood, original a complicated cython function, that got removed in the last hackathon because it was too complicated for devs to maintain. Now we have this baby numpy alternative here, which I think it's fine.

The different between a blackbox Likelihood/model and individual Ops is thin. I don't want to delve into that here because it requires more PyTensor exposition. I just wanted to make this NB more useful than it was before not to change the pedagogy.


		+++

		Now, let's use this Op to repeat the example shown above. To do this let's create some data containing a straight line with additive Gaussian noise (with a mean of zero and a standard deviation of `sigma`). For simplicity we set {class}`~pymc.Uniform` prior distributions on the gradient and y-intercept. As we've not set the `grad()` method of the Op PyMC will not be able to use the gradient-based samplers, so will fall back to using the {class}`pymc.Slice` sampler.

Update blackbox likelihood example #631

Update blackbox likelihood example #631

Conversation

ricardoV94 commented Feb 2, 2024 • edited Loading

Changes

Checklist

review-notebook-app bot commented Feb 2, 2024

ricardoV94 commented Feb 2, 2024

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

review-notebook-app bot commented Feb 2, 2024 • edited Loading

OriolAbril Feb 6, 2024

Choose a reason for hiding this comment

OriolAbril commented Feb 6, 2024

jessegrabowski left a comment

Choose a reason for hiding this comment

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Feb 14, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Mar 13, 2024

ricardoV94 commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading

review-notebook-app bot commented Feb 2, 2024 •

edited

Loading