Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GLM-linear to use numpy.Generator and ArviZ #149

Merged
merged 2 commits into from
May 6, 2021

Conversation

chiral-carbon
Copy link
Collaborator

@chiral-carbon chiral-carbon commented Apr 29, 2021

Addresses issue #81 and aims to make the following changes:

General Updates

  • Uses numpy generator

ArviZ related

  • Uses figsize as kwarg to run plot_trace function

Pending

  • Checking why the manual model runs slower

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -103,9 +103,6 @@
"metadata": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we only use it once, I would keep the rng creation in the 2nd cell plus consequent usage of the rng object, like the example in https://github.com/pymc-devs/pymc3/wiki/PyMC3-Jupyter-Notebook-Style-Guide


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add that.

@@ -103,9 +103,6 @@
"metadata": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add the versions of theano and xarray? I added how to in the last point at https://github.com/pymc-devs/pymc3/wiki/PyMC3-Jupyter-Notebook-Style-Guide


Reply via ReviewNB

Copy link
Collaborator Author

@chiral-carbon chiral-carbon Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray is not being used in this particular notebook so would it be okay to add its version here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's not being used but it is being used by ArviZ under the hood, like pymc3 uses theano under the hood even if we don't import or call it explicitly.

@OriolAbril
Copy link
Member

Looks good, thanks!

I have two extra general comments in addition to the ones above. The first one is making sure that you have executed the notebook sequentially after editing the code to make sure everything runs correctly, the second is to modify the PR title to something more descriptive in the context of this repo, sooner or later all notebooks will be updated for arviz, numpy generator and so on, so finding commits with messages about ArviZ would be difficult, some options that come to mind "update glm-linear to v3 best practices" or "update glm-linear to use numpy.Generator and ArviZ"

@chiral-carbon
Copy link
Collaborator Author

thanks for the suggestions, I'll push these changes.
Regarding running the notebook, I did run it for testing if the code changes work. But in this particular commit, I have only made the code changes without running the notebook, the reason being that for some reason the manual model seems to run faster for me and the glm.from_formula takes more time. I wanted to ask if you might know what the reason for it could be - is there some dependency that I might be missing or something else because of which I get this result?

@chiral-carbon chiral-carbon changed the title Use numpy generator and ArviZ Update GLM-linear to use numpy.Generator and ArviZ Apr 30, 2021
@OriolAbril
Copy link
Member

the reason being that for some reason the manual model seems to run faster for me and the glm.from_formula takes more time. I wanted to ask if you might know what the reason for it could be - is there some dependency that I might be missing or something else because of which I get this result?

I don't really know, if the performance difference is not systematic you can then ignore the comment on the issue. It is also not worth to go into the source code of the glm module as it will probably be deprecated and replaced by bambi (as an external dependency) in v4

@chiral-carbon
Copy link
Collaborator Author

chiral-carbon commented May 1, 2021

Okay, so if I push a new commit with the suggested changes (adding xarray version and keeping default_rng() in cell 2) should I run the notebook too or not? the notebook will execute, but as I mentioned the output in the model cells would change, and since it may be an isolated issue in my case since we don't know the cause, the output might reflect an inaccurate performance of the model if I run it?

@OriolAbril
Copy link
Member

Performance will always vary between computers, the runtimes in the docs are at most orientative of the order of magnitude, we should always execute the notebooks and push the changes when modified

@chiral-carbon
Copy link
Collaborator Author

understood. thanks for clarifying.

@chiral-carbon
Copy link
Collaborator Author

chiral-carbon commented May 1, 2021

so I tried executing %watermark -n -u -v -iv -w (-p theano xarray) and it returned a usage error:

UsageError: unrecognized arguments: (-p theano xarray)

I tried several things and realized that passing -p PACKAGES with a single package works, but it somehow fails with multiple arguments for me.

this seemed to work:


%watermark -n -u -v -iv -p theano 
%watermark -p xarray -w

and it outputs:

Last updated: Sat May 01 2021

Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.13.0

theano: 1.1.2

pymc3     : 3.11.2
numpy     : 1.18.5
arviz     : 0.11.2
matplotlib: 3.3.4

xarray: 0.17.0

Watermark: 2.2.0

I wanted your input as to whether you also get this error and if the correction I provided will be acceptable since passing (-p theano xarray) is not working in my notebook.

@OriolAbril
Copy link
Member

I think the error comes from the parenthesis. They are in the guide to indicate that this extra args may not be necessary, many notebooks import theano and xarray explicitly and their versions are listed automatically

@chiral-carbon
Copy link
Collaborator Author

okay. running %watermark -n -u -v -iv -w -p theano xarray however also returns a usage error, as it doesn't recognize the second arg passed (in this case xarray). The error in this case is: UsageError: unrecognized arguments: xarray.

Should I push the commit by breaking the %watermark statement into 2 statements to pass the flags as I have mentioned above? It doesn't align with the style guide but prints the package versions as desired.

@OriolAbril
Copy link
Member

My bad, I assumed spaces would work without looking carefully into it. It looks like the separation between theano and xarray has to be a comma instead.

Source: https://github.com/rasbt/watermark/blob/master/watermark/watermark.py#L180

@chiral-carbon
Copy link
Collaborator Author

great, got it!

@OriolAbril OriolAbril merged commit ad38e7d into pymc-devs:main May 6, 2021
@chiral-carbon chiral-carbon added outreachy2021 tracker id Issues used as trackers in the notebook update project, do not close! and removed tracker id Issues used as trackers in the notebook update project, do not close! labels Jun 14, 2021
@chiral-carbon chiral-carbon deleted the glm-linear branch August 27, 2021 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants