Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update hierarchical partial pooling notebook #219

Merged
merged 10 commits into from
Nov 21, 2021

Conversation

mjhajharia
Copy link
Member

  • adds dims and coords to use plot_forest without yticklabels, @OriolAbril since the arviz label thing is WIP from what i inferred this seems to be an ok way?

  • changed numpy random generator from legacy to default_rng() (the new standard one)

  • also, there wasn't a lot to update in this notebook, it was pretty short, if there's things i'm missing let me know!

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-09-01T07:17:15Z
----------------------------------------------------------------

thanks, looks good! do add -p xarray with the watermark.


@OriolAbril
Copy link
Member

does the notebook run without changes with both v3 and v4?

@mjhajharia
Copy link
Member Author

mjhajharia commented Sep 1, 2021 via email

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 16, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-09-16T20:07:03Z
----------------------------------------------------------------

apply is usually pretty slow, the second line can probably be done much faster with

player_names = data['FirstName'] + ' ' + data['LastName']

Just for an example:

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'FirstName': ['foo']*1000, 'LastName': ['bar']*1000})

In [4]: %%timeit
   ...: df.apply(lambda x: x.FirstName + " " + x.LastName, axis=1)
   ...: 
   ...: 

14.5 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %%timeit

   ...: df['FirstName'] + ' ' + df['LastName']
   ...: 
   ...: 

213 µs ± 9.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



michaelosthege commented on 2021-10-08T13:51:03Z
----------------------------------------------------------------

Don't know if it renders well, but a pm.model_to_graphviz might be nice here.

mjhajharia commented on 2021-11-21T00:13:51Z
----------------------------------------------------------------

thanks @michaelosthege it came out nice

Copy link
Contributor

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just left a comment which is more of an FYI than anything else

@OriolAbril
Copy link
Member

@mjhajharia can you rerun it with v3 for now still and add the post directive and use bibtex references? link to updated style guide: https://github.com/pymc-devs/pymc3/wiki/PyMC3-Jupyter-Notebook-Style-Guide

@mjhajharia mjhajharia force-pushed the main branch 2 times, most recently from 56eaec5 to 00bd4e5 Compare October 6, 2021 21:03
@mjhajharia
Copy link
Member Author

re executed everything with v3. the only changes were replacing aesara.tensor with theano.tensor

Copy link
Member

Don't know if it renders well, but a pm.model_to_graphviz might be nice here.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2021

View / edit / reply to this conversation on ReviewNB

michaelosthege commented on 2021-10-08T13:53:05Z
----------------------------------------------------------------

Line #2.        trace = pm.sample(2000, tune=2000, chains=2, target_accept=0.95, return_inferencedata=True)

Didn't we decide to call it idata ?


OriolAbril commented on 2021-10-08T14:04:11Z
----------------------------------------------------------------

I think we haven't decided anything yet, the issue is open https://github.com/pymc-devs/pymc/issues/4821 and nothing has been aded to the style guide yet. And it's also unclear if we should enforce those names or offer them as suggestions in case author is not opinionated about naming

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2021

View / edit / reply to this conversation on ReviewNB

michaelosthege commented on 2021-10-08T13:53:06Z
----------------------------------------------------------------

The names are all in braces. Did they end up as length-1 lists? Maybe doublecheck if model.coords["player_names"] has the right type.


OriolAbril commented on 2021-10-08T14:08:41Z
----------------------------------------------------------------

I was probably the one to do it but I am not sure why. This is the default style for plot_forest, see https://arviz-devs.github.io/arviz/api/generated/arviz.plot_forest.html also for example. The variable name is only shown once, and the coord values which are what differs are written in all cases, but using the same "format". I guess this allows to distinguish [Max Alvis] is coordinate Max Alvis of variable theta whereas Max Alvis is independent variable Max Alvis. I am opinionated on the labelling creation and labeller class, but not really on the actual formats so any suggestions are welcome

mjhajharia commented on 2021-11-21T00:16:47Z
----------------------------------------------------------------

the actual type is correct, var_names="thetas" works as a temporary patch for this case

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-10-08T14:00:33Z
----------------------------------------------------------------

post directive should go in the first cell but under the title. I don't really understand why, but adding this here doesn't work. i.e. there is no hierarchical model tag in https://pymc-examples--219.org.readthedocs.build/en/219/blog/tag.html nor the notebook appears listed in https://pymc-examples--219.org.readthedocs.build/en/219/blog/tag/pymc3model.html.

also cc @martinacantaro, tags page says "hierarchical model" and "linear model" but tags currently in use are "hierarchical" and "linear model", not sure what to do, change hierarchical and add model? leave as is? Once decided on something we do need to enforce it so tags are actually meaningful


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-10-08T14:00:34Z
----------------------------------------------------------------

the text with (1) and (2) feels like it should be a numbered list instead


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-10-08T14:00:35Z
----------------------------------------------------------------

can you also add dims to y. It won't have any effect on the shape but it will define the dims in inferencedata which would be useful if doing complicated postprocessing, they may appear in the graphviz diagram and even as annotation alone they are also helpful I think


Copy link
Member

I think we haven't decided anything yet, the issue is open https://github.com/pymc-devs/pymc/issues/4821 and nothing has been aded to the style guide yet. And it's also unclear if we should enforce those names or offer them as suggestions in case author is not opinionated about naming


View entire conversation on ReviewNB

Copy link
Member

I was probably the one to do it but I am not sure why. This is the default style for plot_forest, see https://arviz-devs.github.io/arviz/api/generated/arviz.plot_forest.html also for example. The variable name is only shown once, and the coord values which are what differs are written in all cases, but using the same "format". I guess this allows to distinguish [Max Alvis] is coordinate Max Alvis of variable theta whereas Max Alvis is independent variable Max Alvis. I am opinionated on the labelling creation and labeller class, but not really on the actual formats so any suggestions are welcome


View entire conversation on ReviewNB

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking as request changes for the post directive thing, the rest of the comments are minor. Once the post directive is right and tags are working consider this approved even if I don't review again

Copy link
Member Author

thanks @michaelosthege it came out nice


View entire conversation on ReviewNB

Copy link
Member Author

the actual type is correct, var_names="thetas" works as a temporary patch for this case


View entire conversation on ReviewNB

@mjhajharia
Copy link
Member Author

btw we could maybe add a test for references.bib syntax errors as well, so it gets fixed in pre-commit or something

@mjhajharia
Copy link
Member Author

checked the render, everything worked out, I did the requested changes but because of rebasing git doesn't recognise them, can someone approve and merge?

@michaelosthege michaelosthege merged commit 7d92add into pymc-devs:main Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants