Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update multilevel modeling notebook #246

Merged
merged 4 commits into from
Nov 14, 2021

Conversation

farhanreynaldo
Copy link
Contributor

@farhanreynaldo farhanreynaldo commented Nov 9, 2021

Updated the multilevel modeling notebook with the following improvements:

  • Abide by code preamble.
  • Try-except to load data.
  • Run on the latest PyMC and Arviz.
    Addresses issue Multilevel modeling #47.

Need to figure why the layout constrained warning is popped out and covariation_intercept_slope yielded Overflow Error on the latest library.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Nov 9, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:36Z
----------------------------------------------------------------

Latest version should be 3.11.4. There might have been deprecations to prepare the way for v4 we should try and take into account here. Can you try updating?


farhanreynaldo commented on 2021-11-10T05:42:55Z
----------------------------------------------------------------

Sure, I've updated it on the latest commit.

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:37Z
----------------------------------------------------------------

This is not doing anything. IIRC, the constrained layout setting is false by default, and it's the arviz-darkgrid style what sets it to true as it makes plots look better generally. It is still experimental though so it fails in some cases and it is also incompatible with other approaches like tight_layout. I added a comment on that, I think it's best to fix the plots with warnings but leave constrained_layout=True as default


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:38Z
----------------------------------------------------------------

Do not set the global random seed. I think the notebook does not generate random samples directly with numpy, so having the seed and passing it to sampling methods should be enough.


@review-notebook-app
Copy link

review-notebook-app bot commented Nov 9, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:38Z
----------------------------------------------------------------

Try seeing how the figure looks if removing the tight_layout at the end, and how it looks if adding constrained_layout=False in the call to plt.subplots. I think both should not have warnings, choose the one that looks best to you.


farhanreynaldo commented on 2021-11-09T13:25:15Z
----------------------------------------------------------------

I've tried both approaches in this cell, and it showed a practically similar figure. I think we could remove the tight_layout.

OriolAbril commented on 2021-11-09T18:06:09Z
----------------------------------------------------------------

sounds great

@review-notebook-app
Copy link

review-notebook-app bot commented Nov 9, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:39Z
----------------------------------------------------------------

We should probably try and make this fail more gracefully (warning+no plot instead of error) but we can (and I think should directly avoid plotting the diagonal of chol_corr (which is a constant equal to 1).

The approach shown in https://github.com/arviz-devs/arviz/issues/1627 should work for that I think.

If you have time, it would also be great to create a minimal reproducible error of the overflow issue and open an issue in ArviZ. i.e. a traceplot with two variables only, one from np.random.normal and the other a constant one to get the same behaviour. If you take a look at the old plot, you'll see that there is no kde around 1 even though there is a line trace at 1 in the right column. The kde can' be generated for constants and it has never been, but somehow we have changed from silently failing and plotting nothing without warning to erroring out.


farhanreynaldo commented on 2021-11-09T14:59:14Z
----------------------------------------------------------------

The kde can' be generated for constants and it has never been, but somehow we have changed from silently failing and plotting nothing without warning to erroring out.

That's make sense. I will ignore the diagonal of chol_corr in this case.

And also when I opened this issue in Arviz, are you suggesting proposing warning and no plot when the constant cases happened?

OriolAbril commented on 2021-11-09T18:07:07Z
----------------------------------------------------------------

Yes, I think warning plus skipping to plot the variable is ideal. We can't plot, but simply skipping it is potentially confusing to users

@review-notebook-app
Copy link

review-notebook-app bot commented Nov 9, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-09T11:29:40Z
----------------------------------------------------------------

Can you also add an authors section? We have not yet added this to the style guide but will soon. See https://github.com//issues/198#issuecomment-916486451. There are still a couple things left to be decided regarding this approach, depending on how everything looks we'll choose the final style and try and enforce it.

After a very quick look at GitHub PRs, I propose something like:

* Authored by Chris Fonnesbeck in May, 2017 ([pymc#2124](https://github.com/pymc-devs/pymc/pull/2124))
* Updated by Colin Carroll in June, 2018 ([pymc#3049](https://github.com/pymc-devs/pymc/pull/3049))
* Updated by Alex Andorra in January, 2020 ([pymc#3765](https://github.com/pymc-devs/pymc/pull/3765))
* Updated by Oriol Abril in June, 2020 ([pymc#3963](https://github.com/pymc-devs/pymc/pull/3963))
* Updated by Farhan Reynaldo in November 2021 ([pymc-examples#246](https://github.com/pymc-devs/pymc-examples/pull/246))

Plus adding the names only to the post directive at the top of the notebook


farhanreynaldo commented on 2021-11-09T14:52:04Z
----------------------------------------------------------------

Sure, I will add the authors section as you suggested.

Plus adding the names only to the post directive at the top of the notebook

I am not quite sure I understand this part, could clarify this?

OriolAbril commented on 2021-11-09T18:08:36Z
----------------------------------------------------------------

a bit like it's done with tags. The spline notebook does it: https://github.com/pymc-devs/pymc-examples/blob/main/examples/splines/spline.ipynb. The metadata we add in this post directive is hidden in the rendered version, but we can then use it for example to generate a recommended citation as we are setting up in https://github.com//pull/239

farhanreynaldo commented on 2021-11-10T05:44:36Z
----------------------------------------------------------------

Understood, I have updated it in the latest commit. Let me know if I missed anything.

Copy link
Contributor Author

I've tried both approaches, and it showed a practically similar figure. I think we could remove the tight_layout.


View entire conversation on ReviewNB

Copy link
Contributor Author

Sure, I will add the authors section as you suggested.

Plus adding the names only to the post directive at the top of the notebook

I am not quite sure I understand this part, could clarify this?


View entire conversation on ReviewNB

Copy link
Contributor Author

The kde can' be generated for constants and it has never been, but somehow we have changed from silently failing and plotting nothing without warning to erroring out.

That's make sense. I will ignore the diagonal of chol_corr in this case.

And also when I opened this issue in Arviz, are you suggesting proposing warning and no plot when the constant cases happened?


View entire conversation on ReviewNB

Copy link
Member

sounds great


View entire conversation on ReviewNB

Copy link
Member

Yes, I think warning plus skipping to plot the variable is ideal. We can't plot, but simply skipping it is potentially confusing to users


View entire conversation on ReviewNB

Copy link
Member

OriolAbril commented Nov 9, 2021

a bit like it's done with tags. The spline notebook does it: https://github.com/pymc-devs/pymc-examples/blob/appearance/examples/splines/spline.ipynb. The metadata we add in this post directive is hidden in the rendered version, but we can then use it for example to generate a recommended citation as we are setting up in #239

I will add everything to the style guide wiki once we merge #293, sorry we are moving the finish line while you work


View entire conversation on ReviewNB

Copy link
Contributor Author

Sure, I've updated it on the latest commit.


View entire conversation on ReviewNB

Copy link
Contributor Author

Understood, I have updated it in the latest commit. Let me know if I missed anything.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Nov 12, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-11-12T12:40:14Z
----------------------------------------------------------------

Line #7.        coords={"param": ["a"], "param_bis": ["b"]},

This is also skipping one of the elements of chol_stds. I think we need to do two plots, one for all vars like right now but adding chol_corr to the variables to skip, then in the cell below plotting chol_corr only. I would also use the example of dataarray for indexing with the "extra" pointwise selection dim. Here it is an overkill, but I think it's good for illustrative purposes, as already in 3d we'd need to use that.

Here we can select by doing param=a, param_bis=b

# a    , b  param/param_bis

[[ 1, corr], # a
[ corr, 1]] # b

but in 3d we can't and need to be able to pass the pairs we are interested in insead of a "square" slice:

# a     , b    , c  param/param_bis
[[ 1, corr1, corr2], # a
[ corr1, 1, corr3], # b
[ corr2, corr3, 1]] # c

param=[a, b], param_bis=[b, c] is not valid, and neither is param=a, param_bis=c



farhanreynaldo commented on 2021-11-13T13:15:37Z
----------------------------------------------------------------

This is also skipping one of the elements of chol_stds. I think we need to do two plots, one for all vars like right now but adding chol_corr to the variables to skip, then in the cell below plotting chol_corr only.

Don't you think it would be confusing for the readers if we separate the plot between the chol_corr and the rest of the vars? We do this due to the limitation of current implementation, right?

Here it is an overkill, but I think it's good for illustrative purposes, as already in 3d we'd need to use that.

I am relatively new and unfamiliar with the xarray, but from my current understanding the array forcovariation_intercept_slope_idata.posterior['chol_corr'] is 2-dimensional, as shown below:

array([[[[ 1.        , -0.28202288],
         [-0.28202288,  1.        ]],
    [[ 1.        , -0.73111316],
     [-0.73111316,  1.        ]],
    ...

Did I understand it incorrectly?

And also if we are going to proceed to separate the plots, is the following code is the correct implementation?

# without chol_corr
az.plot_trace(
covariation_intercept_slope_idata,
var_names=["~z", "~chol", "~chol_corr"],
compact=True,
chain_prop={"ls": "-"},
);

and the next cell is

az.plot_trace(
covariation_intercept_slope_idata,
var_names=["chol_corr"],
lines=[("chol_corr", {}, 0.0)],
compact=True,
chain_prop={"ls": "-"},
coords={"param": ["a"], "param_bis": ["b"]},
);

farhanreynaldo commented on 2021-11-13T13:22:19Z
----------------------------------------------------------------

Or do you mean adding coords={"param": ["a"], "param_bis": ["b"]} would also affected the county and chol_stds variables?

OriolAbril commented on 2021-11-13T16:58:32Z
----------------------------------------------------------------

Yes, it is two dimensional which is why it works in this case, but it only works in 2d which is not always the case, and I think we should use the example from the issue _even if it's not necessary here_ so that the notebook is useful also for people who might have >=3d corr matrices.

Or do you mean adding coords={"param": ["a"], "param_bis": ["b"]} would also affected the county and chol_stds variables?

Yes, it has modified the output of the chol_stds plot. coords selection is applied to the whole dataset and therefore it affects all variables that have those dimensions (so county is not really affected because it's multidim but with other dimensions).

The code example you shared with the two cells is the idea I was proposing. The coords need to be updated to use this pointwise selection (again, not necessary here because we are only in 2d but needed for 3 or more dims). You can also use var_names="chol_corr" the list is no necessary.

@OriolAbril
Copy link
Member

Thanks for the great work so far, left one last comment before merging

Copy link
Contributor Author

Or do you mean adding coords={"param": ["a"], "param_bis": ["b"]} would also affected the county and chol_stds variables?


View entire conversation on ReviewNB

Copy link
Member

Yes, it is two dimensional which is why it works in this case, but it only works in 2d which is not always the case, and I think we should use the example from the issue _even if it's not necessary here_ so that the notebook is useful also for people who might have >=3d corr matrices.

Or do you mean adding coords={"param": ["a"], "param_bis": ["b"]} would also affected the county and chol_stds variables?

Yes, it has modified the output of the chol_stds plot. coords selection is applied to the whole dataset and therefore it affects all variables that have those dimensions (so county is not really affected because it's multidim but with other dimensions).

The code example you shared with the two cells is the idea I was proposing. The coords need to be updated to use this pointwise selection (again, not necessary here because we are only in 2d but needed for 3 or more dims). You can also use var_names="chol_corr" the list is no necessary.


View entire conversation on ReviewNB

@OriolAbril
Copy link
Member

changes look great, but the notebook doesn't look executed now, can you make sure it is executed with the outputs stored in the ipynb file? The preview looks like this: https://pymc-examples--246.org.readthedocs.build/en/246/case_studies/multilevel_modeling.html

@farhanreynaldo
Copy link
Contributor Author

changes look great, but the notebook doesn't look executed now, can you make sure it is executed with the outputs stored in the ipynb file? The preview looks like this: https://pymc-examples--246.org.readthedocs.build/en/246/case_studies/multilevel_modeling.html

My bad, I forgot to rerun the notebook, I am on it.

@farhanreynaldo
Copy link
Contributor Author

Should I also add the epilogue section in this notebook?

@OriolAbril
Copy link
Member

Should I also add the epilogue section in this notebook?

no, we might move this one to pymc again in order to maintain a handful of base notebooks always synced with the latest releases and versioned, so those might look a bit different than the rest of examples.

@OriolAbril OriolAbril merged commit 86da193 into pymc-devs:main Nov 14, 2021
@farhanreynaldo
Copy link
Contributor Author

Thank you so much for your guidance for this PR @OriolAbril!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants