Update GLM Model Selection to best practices #173

chiral-carbon · 2021-06-09T10:20:46Z

Addresses issue #83 and aims to advance it to Best Practices

Use numpy generator
Update watermark
Use bambi

review-notebook-app · 2021-06-09T10:20:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

chiral-carbon · 2021-07-26T12:26:38Z

I am facing an error in cell 16, would like some help with resolving it.
Also, the results from manual model (cell 11) and using bambi (cell 13) are quite different, so I want some help with understanding that as well.

review-notebook-app · 2021-07-28T04:19:10Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-07-28T04:19:10Z
----------------------------------------------------------------

@aloctavodia for this to work we need to add the transformed variables to the inferencedata object (or to the dict that is generated from the summary df). Does doing that manually make sense? Is there some bambi magic to help with that? Or maybe this can be removed altogether and keep only waic and loo?

chiral-carbon commented on 2021-08-31T16:33:47Z
----------------------------------------------------------------

the error message has changed here from two commits ago, it is to do with the arguments passed to models_lin[nm].backend.model.logp() or models_quad[nm].backend.model.logp()

Sayam753 commented on 2021-09-08T15:46:06Z
----------------------------------------------------------------

I have two questions and a working solution to fix the error above.

Here comes the questions first -

The description above the code cell says - "Evaluate log likelihoods straight from model.logp"

But the code cell does not calculate log likelihood of the model, instead it computes log probability of the entire model.

For context, PyMC3 model's log prob = sum over all dimensions of log prob calculated for all free RVs + log likelihood of observed RVs

Question 1 - For this part of model selection, are we interested to look into log probability of the entire model or just the model's log likelihood?

Question 2 - If we are interested in model's log likelihood, can we use the log_likelihood present in posterior trace? Just sum it over all dimensions and compare?

Coming to solution to fix the error

There are shape mismatches, because bambi creates Intercept prior of shape 1 (reference here) and all variables' mean from arviz summary has shape = ()
Then we also need transformed samples to compute model.logp.

I created two helper functions that rectify the shapes and to include transformed samples in the trace.

And things worked. Here is the GitHub gist of my experiment - https://nbviewer.jupyter.org/gist/Sayam753/d605220d570be6697b4755a9f83a5dd4

I think these functions can be used by bambi.

review-notebook-app · 2021-07-28T04:19:10Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-07-28T04:19:10Z
----------------------------------------------------------------

In my opinion, this should create two dictionaries of traces, one for lin and the other for quad, instead of combining them all into the same dict. Something like:

lin_trace_dict = dict()
quad_trace_dict = dict()
for nm in ["k1", "k2", "k3", "k4", "k5"]:
    lin_trace_dict.update({f"{nm}": traces_lin[nm]})
    quad_trace_dict.update({f"{nm}": traces_quad[nm]})

Then we also generate two dfs instead of one.

chiral-carbon commented on 2021-07-30T11:40:53Z
----------------------------------------------------------------

sure yes, it makes more sense

Sayam753 commented on 2021-08-29T20:50:12Z
----------------------------------------------------------------

Why are we creating a new copy of traces_lin and traces_quad ? I think they can be used as it is in downstream tasks.

OriolAbril commented on 2021-08-30T14:28:32Z
----------------------------------------------------------------

I thought we were converting a list into a dict, this is not necessary indeed.

review-notebook-app · 2021-07-28T04:19:11Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-07-28T04:19:11Z
----------------------------------------------------------------

And therefore also call plot_compare twice to get two subplots side by side.

_, axes = plt.subplots(1, 2, sharey=True)
az.plot_compare(dfwaic_lin, ax=axes[0])
az.plot_compare(dfwaic_quad, ax=axes[1])
axes[0].set_title("Linear data")

axes[1].set_title("Quadratic data");

OriolAbril

Also, even though we basically have one model in this notebook, we have 3 different representations of it which I think can be quite confusing. math+widgets use one set of variable names, manual pymc3 code another and glm/bambi model a 3rd.

Ideally these should all be the same or extremely similar and unambiguous, so the notebook is easier to follow, but I am not sure how bambi chooses names or if it's possible to customize those somehow, so before choosing which names to use I wanted to get an idea of the possibilities. cc @martinacantaro

The model also has a lot of divergences, even in linear/quadratic versions, which probably has an effect to waic and loo which also show warnings always, so while the results don't make sense (i.e. k5 is supposed to be the best model for quad data) there isn't really any reason why they should make any sense. I noticed that only the x data is standardized, maybe standardizing y too (plus maybe undoing the transformation for the plots) works better? Other ideas?

chiral-carbon · 2021-07-30T11:40:54Z

sure yes, it makes more sense

View entire conversation on ReviewNB

review-notebook-app · 2021-08-29T20:38:57Z

View / edit / reply to this conversation on ReviewNB

Sayam753 commented on 2021-08-29T20:38:57Z
----------------------------------------------------------------

Line #133.        Create a polynomial modelspec string for patsy

should be

Line #133. Create a polynomial modelspec string for bambi

Sayam753 · 2021-08-29T20:50:13Z

Why are we creating a new copy of traces_lin and traces_quad ? I think they can be used as it is in downstream tasks.

View entire conversation on ReviewNB

OriolAbril · 2021-08-30T14:28:32Z

I thought we were converting a list into a dict, this is not necessary indeed.

View entire conversation on ReviewNB

review-notebook-app · 2021-08-30T14:39:47Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:39:46Z
----------------------------------------------------------------

This has to be removed

review-notebook-app · 2021-08-30T14:39:47Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:39:47Z
----------------------------------------------------------------

waic stands for "widely applicable information criterion" or "Watanabe–Akaike information criterion" the appliable seems like a momentary confusion.

Also remember to update the descriptions of what the code does. We no longer loop over models to compute waic, giving the dictionary to arviz is enough.

chiral-carbon commented on 2021-08-31T13:13:39Z
----------------------------------------------------------------

will change this.

review-notebook-app · 2021-08-30T14:39:48Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:39:47Z
----------------------------------------------------------------

Now model is converging and results are as expected, k1 for linear data and k2 for quadratic data. Also note the change in scale from deviance to log, so we no longer want lower waic (see https://discourse.pymc.io/t/is-model-with-high-or-low-waic-and-loo-better/7874/2 for more details).

chiral-carbon commented on 2021-08-31T13:15:45Z
----------------------------------------------------------------

right. so should I remove the points mentioning importance/requirement of lower waic everywhere and instead say higher waic is best (due to the change in scale from deviance to log)?

chiral-carbon commented on 2021-08-31T16:39:21Z
----------------------------------------------------------------

should I also mention the scale used for clarification?

OriolAbril commented on 2021-09-08T09:57:45Z
----------------------------------------------------------------

I would add a note about the existence and extended usage of multiple scales and recommend to read the doctring of the function in case of doubt. See also other changes about adding references

review-notebook-app · 2021-08-30T14:39:48Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:39:48Z
----------------------------------------------------------------

same comment here.

review-notebook-app · 2021-08-30T14:40:43Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:40:43Z
----------------------------------------------------------------

This has to be removed

review-notebook-app · 2021-08-30T14:40:44Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:40:43Z
----------------------------------------------------------------

waic stands for "widely applicable information criterion" or "Watanabe–Akaike information criterion" the appliable seems like a momentary confusion.

Also remember to update the descriptions of what the code does. We no longer loop over models to compute waic, giving the dictionary to arviz is enough.

review-notebook-app · 2021-08-30T14:40:44Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:40:44Z
----------------------------------------------------------------

Now model is converging and results are as expected, k1 for linear data and k2 for quadratic data. Also note the change in scale from deviance to log, so we no longer want lower waic (see https://discourse.pymc.io/t/is-model-with-high-or-low-waic-and-loo-better/7874/2 for more details).

review-notebook-app · 2021-08-30T14:40:45Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T14:40:45Z
----------------------------------------------------------------

same comment here.

chiral-carbon · 2021-08-31T13:13:40Z

will change this.

View entire conversation on ReviewNB

chiral-carbon · 2021-08-31T13:15:46Z

right. so should I remove the points mentioning importance/requirement of lower waic everywhere?

View entire conversation on ReviewNB

review-notebook-app · 2021-08-31T13:33:42Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-31T13:33:41Z
----------------------------------------------------------------

hey @tomicapretto I had run this same code block for GLM using bambi in the previous commit for this notebook but did not get this error. I used bambi==0.5.0 till the previous commit and am using bambi==0.6.0 now, I'm not sure if this error is due to some update or not, could you help?

tomicapretto commented on 2021-08-31T13:44:57Z
----------------------------------------------------------------

If you write "Intercept" instead of "intercept", it should work. But this is still a bug i'd like to fix in the future. Let me know if it fixes the problem!

chiral-carbon commented on 2021-08-31T15:50:32Z
----------------------------------------------------------------

Thank you, it did! very silly of me to not realize this 😅

tomicapretto · 2021-08-31T13:44:58Z

If you write "Intercept" instead of "intercept", it should work. But this is still a bug i'd like to fix in the future. Let me know if it fixes the problem!

View entire conversation on ReviewNB

chiral-carbon · 2021-08-31T15:50:33Z

Thank you, it did! very silly of me to not realize this 😅

View entire conversation on ReviewNB

chiral-carbon · 2021-08-31T16:33:48Z

the error message has changed here from two commits ago, it is to do with the arguments passed to models_lin[nm].backend.model.logp(). I guess logp is not a valid function name in bambi, but it is in pymc backend?

View entire conversation on ReviewNB

chiral-carbon · 2021-08-31T16:39:22Z

should I also mention the scale used for clarification?

View entire conversation on ReviewNB

OriolAbril · 2021-09-08T09:57:46Z

I would add a note about the existence and extended usage of multiple scales and recommend to read the doctring of the function in case of doubt. See also other changes about adding references

View entire conversation on ReviewNB

review-notebook-app · 2021-09-08T10:04:17Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T10:04:17Z
----------------------------------------------------------------

Add an extra sentence to the paragraph "WAIC is implemented in {func}arviz.waic which we will use below"

Note: reviewnb renders everything wrong in the comments, keep in mind the right syntax as copy pasting won't work. ref https://arviz-devs.github.io/arviz/contributing/developer_guide.html#hyperlinks

review-notebook-app · 2021-09-08T10:04:18Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T10:04:18Z
----------------------------------------------------------------

I would add a link to arviz.loo function. For example, changing the last sentence to "One alternative is applying the numerical approach using the posterier trace as suggested in Vehtari et al 2015, which is implemented in {func}arviz.loo. We will use it via {func}arviz.compare to compare our models.

review-notebook-app · 2021-09-08T14:46:46Z

View / edit / reply to this conversation on ReviewNB

Sayam753 commented on 2021-09-08T14:46:45Z
----------------------------------------------------------------

Line #155.                fml, df, priors={"intercept": bmb.Prior("Normal", mu=0, sigma=100)}, family="gaussian"

There was a mismatch between naming of prior either "intercept" or "Intercept" in comments below. Do we need to correct things here we well?

cc @tomicapretto

tomicapretto commented on 2021-09-08T14:52:41Z
----------------------------------------------------------------

Yes, the intercept is always named "Intercept", so "Intercept" is what you need if you want to put a prior on it.

tomicapretto · 2021-09-08T14:52:43Z

Yes, the intercept is always named "Intercept", so "Intercept" is what you need if you want to put a prior on it.

View entire conversation on ReviewNB

Sayam753 · 2021-09-08T15:46:07Z

I have two questions and a working solution to fix the error above.

Here comes the questions first -

The description above the code cell says - "Evaluate log likelihoods straight from model.logp"

But the code cell does not calculate log likelihood of the model, instead it computes log probability of the entire model.

For context, PyMC3 model's log prob = sum over all dimensions of log prob calculated for all free RVs + log likelihood of observed RVs

Question 1 - For this part of model selection, are we interested to look into log probability of the entire model or just the model's log likelihood?

Question 2 - If we are interested in model's log likelihood, can we use the log_likelihood present in posterior trace? Just sum it over all dimensions and compare?

Coming to solution to fix the error

There are shape mismatches, because bambi creates Intercept prior of shape 1 (reference here) and all variables' mean from arviz summary has shape = ()
Then we also need transformed samples to compute model.logp.

I created two helper functions that rectify the shapes and to include transformed samples in the trace.

And things worked. Here is the GitHub gist of my experiment - https://nbviewer.jupyter.org/gist/Sayam753/d605220d570be6697b4755a9f83a5dd4

I think these functions can be used by bambi.

View entire conversation on ReviewNB

chiral-carbon requested a review from OriolAbril June 9, 2021 10:21

OriolAbril force-pushed the glm-model-selection branch from 58cbe07 to bbacbb9 Compare June 9, 2021 11:21

chiral-carbon added tracker id Issues used as trackers in the notebook update project, do not close! outreachy2021 and removed tracker id Issues used as trackers in the notebook update project, do not close! labels Jun 14, 2021

OriolAbril reviewed Jul 28, 2021

View reviewed changes

chiral-carbon changed the title ~~[WIP] Update GLM Model Selection to best practices~~ Update GLM Model Selection to best practices Sep 8, 2021

Sayam753 self-requested a review September 8, 2021 14:23

Sayam753 removed their request for review September 8, 2021 16:05

OriolAbril force-pushed the glm-model-selection branch from 98ef34b to a261289 Compare January 8, 2022 02:05

OriolAbril approved these changes Jan 8, 2022

View reviewed changes

chiral-carbon and others added 8 commits January 8, 2022 19:20

[WIP] Update GLM Model Selection to best practices

8b9ac0f

update with bambi

585f528

commit: use separate dicts for linear and quadratic models

04b30aa

update descriptions

edcd748

remove duplicate initialization of traces_lin and traces_quad

5b80b64

fix and execute glm model selection notebook

42472be

add glm tag

c015742

add alt text to flying icon of level categories

f852bf7

OriolAbril force-pushed the glm-model-selection branch from a261289 to f852bf7 Compare January 8, 2022 17:22

OriolAbril merged commit 0f69036 into pymc-devs:main Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GLM Model Selection to best practices #173

Update GLM Model Selection to best practices #173

chiral-carbon commented Jun 9, 2021 •

edited

Loading

review-notebook-app bot commented Jun 9, 2021

chiral-carbon commented Jul 26, 2021

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

review-notebook-app bot commented Jul 28, 2021

OriolAbril left a comment

chiral-carbon commented Jul 30, 2021

review-notebook-app bot commented Aug 29, 2021 •

edited

Loading

Sayam753 commented Aug 29, 2021

OriolAbril commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

review-notebook-app bot commented Aug 31, 2021 •

edited

Loading

tomicapretto commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

OriolAbril commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021 •

edited

Loading

tomicapretto commented Sep 8, 2021

Sayam753 commented Sep 8, 2021

Update GLM Model Selection to best practices #173

Update GLM Model Selection to best practices #173

Conversation

chiral-carbon commented Jun 9, 2021 • edited Loading

review-notebook-app bot commented Jun 9, 2021

chiral-carbon commented Jul 26, 2021

review-notebook-app bot commented Jul 28, 2021 • edited Loading

review-notebook-app bot commented Jul 28, 2021 • edited Loading

review-notebook-app bot commented Jul 28, 2021

OriolAbril left a comment

Choose a reason for hiding this comment

chiral-carbon commented Jul 30, 2021

review-notebook-app bot commented Aug 29, 2021 • edited Loading

Sayam753 commented Aug 29, 2021

OriolAbril commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021 • edited Loading

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

review-notebook-app bot commented Aug 31, 2021 • edited Loading

tomicapretto commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

chiral-carbon commented Aug 31, 2021

OriolAbril commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021

review-notebook-app bot commented Sep 8, 2021 • edited Loading

tomicapretto commented Sep 8, 2021

Sayam753 commented Sep 8, 2021

chiral-carbon commented Jun 9, 2021 •

edited

Loading

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

review-notebook-app bot commented Aug 29, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 31, 2021 •

edited

Loading

review-notebook-app bot commented Sep 8, 2021 •

edited

Loading