New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Take3 model averaging #414

Merged

OriolAbril merged 15 commits into pymc-devs:main from reshamas:take3_model_averaging

Sep 21, 2022

Contributor

reshamas commented Aug 9, 2022 •

edited

Loading

Description

Towards Update example notebooks for better rendering #394
Towards model averaging #67 (The notebook still runs on v3, so it will move to the book style column, not yet to done (which is closing the issue))
Take 1: Closes notebook: model_averaging; update header and footer #411
Take 2: Closes Notebook: model_averaging #412

References

References pymc-devs/pymc #5460 (Update example gallery style and formatting)
References Project Board: https://github.com/pymc-devs/pymc-examples/projects/1

Checklist

Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
PR description contains a link to the relevant issue: a tracker one for existing notebooks or a proposal one for new notebooks
Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

Helpful links

https://github.com/pymc-devs/pymc-examples/blob/main/CONTRIBUTING.md

Notes for the Reviewer

There are two references to "model_comparison.ipynb" and the link is broken. I cannot find this notebook.
I changed mentions of "PyMC3" to "PyMC". Is that ok?

#DataUmbrellaPyMCSprint

reshamas added 2 commits

August 9, 2022 08:00


          notebook: model_averaging (header,footer updates)

c75abb3


          model_averaging: adding myst file

review-notebook-app bot commented Aug 9, 2022

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

reshamas requested a review from OriolAbril

August 9, 2022 12:07

Contributor Author

reshamas commented Aug 9, 2022

@OriolAbril
Thank you for the helpful Git notes (#412 (comment))

All checks are passing now. (phew!)

I have a few notes at the top: "Notes for the Reviewer" regarding the updates I made to the notebook.

Member

OriolAbril commented Aug 10, 2022

There are two references to "model_comparison.ipynb" and the link is broken. I cannot find this notebook.

Here are the references and links (not to be used, only so you see where they point to):

https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/model_comparison.html -> {ref}`pymc:model_comparison`
https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-model-selection.html -> {ref}`GLM-model-selection`
doesn't appear but might be interesting to add to (or at least mention you can also browse the model comparison tag) to find related posts. https://www.pymc.io/projects/examples/en/latest/blog/tag/model-comparison.html -> {doc}`blog/tag/model-comparison`

I changed mentions of "PyMC3" to "PyMC". Is that ok?

Yes, all notebooks need to be updated out of pymc3 both in text and in code, as you are working in the text, you should fix it.

Closes #67

It should not close the issue. The notebook still runs on v3, so it will move to the book style column, not yet to done (which is closing the issue)

OriolAbril reviewed

View reviewed changes

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated Show resolved Hide resolved

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated

               One alternative is to perform model selection but discuss all the different models together with the computed values of a given Information Criterion. It is important to put all these numbers and tests in the context of our problem so that we and our audience can have a better feeling of the possible limitations and shortcomings of our methods. If you are in the academic world you can use this approach to add elements to the discussion section of a paper, presentation, thesis, and so on.
-              Yet another approach is to perform model averaging. The idea now is to generate a meta-model (and meta-predictions) using a weighted average of the models. There are several ways to do this and PyMC3 includes 3 of them that we are going to briefly discuss, you will find a more thorough explanation in the work by [Yuling Yao et. al.](https://arxiv.org/abs/1704.02030)
+              Yet another approach is to perform model averaging. The idea now is to generate a meta-model (and meta-predictions) using a weighted average of the models. There are several ways to do this and PyMC includes 3 of them that we are going to briefly discuss, you will find a more thorough explanation in the work by [Yuling Yao et. al.](https://arxiv.org/abs/1704.02030)

Member

OriolAbril Aug 10, 2022

Can you also add the references to the bibtex file and use the citation role to include them here?

References:

Member

OriolAbril Aug 10, 2022

Also, model comparison is done by ArviZ, we should update the "PyMC includes" to something like "PyMC integrates with ArviZ"

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated Show resolved Hide resolved

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated

    
            @@ -71,7 +80,7 @@ The above formula for computing weights is a very nice and simple approach, but
          
              ## Stacking

              The third approach implemented in PyMC3 is know as _stacking of predictive distributions_ and it has been recently [proposed](https://arxiv.org/abs/1704.02030). We want to combine several models in a metamodel in order to minimize the diverge between the meta-model and the _true_ generating model, when using a logarithmic scoring rule this is equivalently to:

              The third approach implemented in PyMC is known as [_stacking of predictive distributions_](https://arxiv.org/abs/1704.02030). We want to combine several models in a metamodel in order to minimize the divergence between the meta-model and the _true_ generating model, when using a logarithmic scoring rule this is equivalent to:

Member

OriolAbril Aug 10, 2022

same here (it's actually the same paper as above so so far 1 reference to add to bibtex)

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated


		The following example is taken from the superb book [Statistical Rethinking](http://xcelab.net/rm/statistical-rethinking/) by Richard McElreath. You will find more PyMC3 examples from this book in this [repository](https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3). We are going to explore a simplified version of it. Check the book for the whole example and a more thorough discussion of both, the biological motivation for this problem and a theoretical/practical discussion of using Information Criteria to compare, select and average models.
		The following example is taken from the superb book [Statistical Rethinking](http://xcelab.net/rm/statistical-rethinking/) by Richard McElreath. You will find more PyMC examples from this book in this [repository](https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3). We are going to explore a simplified version of it. Check the book for the whole example and a more thorough discussion of both, the biological motivation for this problem and a theoretical/practical discussion of using Information Criteria to compare, select and average models.

Member

OriolAbril Aug 10, 2022

statistical rethinking should be a citation (it already is in the bibtex file but it would be good to add the url to the bibtex citation). The link to the pymc port of the book code should point to https://github.com/pymc-devs/pymc-resources now instead.

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated Show resolved Hide resolved

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated

               ```
               +++ {"papermill": {"duration": 0.055089, "end_time": "2020-11-29T12:14:57.977616", "exception": false, "start_time": "2020-11-29T12:14:57.922527", "status": "completed"}, "tags": []}
-              Now that we have sampled the posterior for the 3 models, we are going to use WAIC (Widely applicable information criterion) to compare the 3 models. We can do this using the `compare` function included with PyMC3.
+              Now that we have sampled the posterior for the 3 models, we are going to use WAIC (Widely applicable information criterion) to compare the 3 models. We can do this using the `compare` function included with PyMC.

Member

OriolAbril Aug 10, 2022

similar comment to the one in the introduction, compare is an arviz function now.

We should probably also add a note or comment on the code. I think the code does use waic, but az.compare now defaults to using loo instead, so running the same code will not use waic anymore.

Member

OriolAbril Aug 10, 2022

or maybe leave a note on the issue for whoever updates the code and reruns it on pymc v4? (which can't really be done for now as sample_posterior_predictive_w doesn't work on v4 yet)

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md


		We can also see that we get a column with the relative `weight` for each model (according to the first equation at the beginning of this notebook). This weights can be _vaguely_ interpreted as the probability that each model will make the correct predictions on future data. Of course this interpretation is conditional on the models used to compute the weights, if we add or remove models the weights will change. And also is dependent on the assumptions behind WAIC (or any other Information Criterion used). So try to do not overinterpret these `weights`.
		We can also see that we get a column with the relative `weight` for each model (according to the first equation at the beginning of this notebook). This weights can be _vaguely_ interpreted as the probability that each model will make the correct predictions on future data. Of course this interpretation is conditional on the models used to compute the weights, if we add or remove models the weights will change. And also is dependent on the assumptions behind WAIC (or any other Information Criterion used). So try to not overinterpret these `weights`.

Member

OriolAbril Aug 10, 2022

This also needs a note, not sure how to rewrite it though, I'll try to come back later. The weight-probability interpretation is only valid for bma, not for stacking. The notebook should be clear on this because is is a common source of confusion, see arviz-devs/arviz#2077 or https://discourse.pymc.io/t/bayesian-model-averaging-ranking-of-model-weights-and-loo-dont-match/4658

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated Show resolved Hide resolved

reshamas added 8 commits

August 15, 2022 16:09


          add watermark header; add footer

f4762e7


          rerunning myst file

8c621a6


          update notebook tags

ff16430


          recreate myst file after updating notebook tags

83b6f25


          minor text fixes

71621b3


          adding myst file

34191d8


          add References section at the end

9dad6d6


          add myst file

f191cd7

OriolAbril reviewed

View reviewed changes

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated Show resolved Hide resolved

myst_nbs/diagnostics_and_criticism/model_averaging.myst.md Outdated

    
            @@ -247,11 +268,11 @@ comp
          
              +++ {"papermill": {"duration": 0.056609, "end_time": "2020-11-29T12:14:58.387481", "exception": false, "start_time": "2020-11-29T12:14:58.330872", "status": "completed"}, "tags": []}

              We can see that the best model is `model_2`, the one with both predictor variables. Notice the DataFrame is ordered from lowest to highest WAIC (_i.e_ from _better_ to _worst_ model). Check [this notebook](model_comparison.ipynb) for a more detailed discussing on model comparison.

              We can see that the best model is `model_2`, the one with both predictor variables. Notice the DataFrame is ordered from lowest to highest WAIC (_i.e_ from _better_ to _worst_ model). Check [model_comparison notebook](model_comparison.ipynb) for a more detailed discussion on model comparison.

Member

OriolAbril Aug 16, 2022

this should also be a sphinx cross reference, not a markdown local link

reshamas added 4 commits

August 17, 2022 09:20


          add 2 more urls to list in pre-commit yaml file

dcf922b


          update yaml, bibtext, citation references

942af75


          adding myst file

5b538b2


          remove end of |

e6cc033

Contributor Author

reshamas commented Aug 21, 2022

@OriolAbril This seems to be the pre-commit error. It's in another notebook, and I'm not sure how to fix it:

examples/howto/custom_distribution.ipynb:124: "If your distribution exists in scipy.stats (https://docs.scipy.org/doc/scipy/reference/stats.html), then you can use the Random Variates method scipy.stats.{dist_name}.rvs to generate random samples.\n",

reshamas requested a review from OriolAbril

September 2, 2022 16:10

Member

OriolAbril commented Sep 2, 2022

The error is because this notebook examples/howto/custom_distribution.ipynb references the scipy docs with urls instead of cross-references. Now that you have added the scipy docs to the list of domains to avoid it is failing. For now add that notebook to the list of ignored notebooks above: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml#L62

OriolAbril reviewed

View reviewed changes

myst_nbs/case_studies/moderation_analysis.myst.md Outdated

@@ @@ -6,9 +6,9 @@ jupytext: @@
                   format_version: 0.13
                   jupytext_version: 1.13.7
               kernelspec:
-                display_name: pymc-dev-py39
+                display_name: Python 3 (ipykernel)

Member

OriolAbril Sep 2, 2022

These changes should not be included in the PR. The only notebook being modified should be the model averaging one.


          revert unwanted updates to moderation_analysis + add excluded nb link

7cd0054

OriolAbril approved these changes

View reviewed changes

OriolAbril merged commit 9fad19c into pymc-devs:main

reshamas mentioned this pull request

Update example notebooks for better rendering #394

Open

16 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet