Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GLM Out of Sample Predictions Notebook #37

Merged
merged 12 commits into from
Mar 8, 2021

Conversation

juanitorduz
Copy link
Collaborator

I would like to add a small example to the docs. See #33 and the original post https://juanitorduz.github.io/glm_pymc3/. I am open for suggestions and changes to make this notebook useful for the community.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 23, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:33Z
----------------------------------------------------------------

*in the design matrix


twiecki commented on 2021-03-02T08:48:21Z
----------------------------------------------------------------

typo still needs to be fixed.

juanitorduz commented on 2021-03-02T09:03:47Z
----------------------------------------------------------------

Thanks, this is fixed now.

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 23, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:33Z
----------------------------------------------------------------

This could be the case when one-hot-encoding categorical variables.

This isn't something I've come across, and I'm not sure I see how that would happen as the one-hot encoding using training set information only - do you have a reference?


twiecki commented on 2021-03-02T08:48:57Z
----------------------------------------------------------------

Yeah I also don't get this point.

juanitorduz commented on 2021-03-02T09:04:53Z
----------------------------------------------------------------

Ok, maybe I needed to explain this better but this is really not relevant so I removed this confusing remark.

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 23, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:34Z
----------------------------------------------------------------

are you sure that

fpr, tpr, thresholds = roc_curve(
    y_true=y_test, y_score=y_test_pred, pos_label=1, drop_intermediate=False
)

is correct? Looking at the sklearn docs , it seems that the second argument (y_score) should be the raw predictions p_test_pred rather than the predicted labels y_test_pred . Indeed, if you look at the examples in the sklearn docs, they pass clf.predict_proba(X)[:, 1]


twiecki commented on 2021-03-02T08:52:36Z
----------------------------------------------------------------

Yes, that's correct, needs to be fixed.

juanitorduz commented on 2021-03-02T09:05:08Z
----------------------------------------------------------------

this is fixed now

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 23, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:35Z
----------------------------------------------------------------

output *by the model


juanitorduz commented on 2021-03-02T09:05:33Z
----------------------------------------------------------------

this is fixed now.

@review-notebook-app
Copy link

review-notebook-app bot commented Feb 23, 2021

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:35Z
----------------------------------------------------------------

Let us *now


@juanitorduz
Copy link
Collaborator Author

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:33Z

This could be the case when one-hot-encoding categorical variables.

This isn't something I've come across, and I'm not sure I see how that would happen as the one-hot encoding using training set information only - do you have a reference?

This is related to (this post)[https://dzone.com/articles/pandasscikit-learn-get-dummies-testtrain-sets], where once the model is deployed it might see new categories. One could try to use handle_unknown = 'ignore' if this is expected (and wanted), but then in the cross-validation step if we use get_dummies we are already showing the model what to expect. Maybe I can just delete this line as it is not very relevant for the post itself :)

@juanitorduz
Copy link
Collaborator Author

View / edit / reply to this conversation on ReviewNB

MarcoGorelli commented on 2021-02-23T14:55:34Z

are you sure that

fpr, tpr, thresholds = roc_curve(
    y_true=y_test, y_score=y_test_pred, pos_label=1, drop_intermediate=False
)
is correct? Looking at the sklearn docs , it seems that the second argument (y_score) should be the raw predictions p_test_pred rather than the predicted labels y_test_pred . Indeed, if you look at the examples in the sklearn docs, they pass clf.predict_proba(X)[:, 1]

You are totally right 🙈 ! Thank you so much! I corrected the typos and this error on this commit.

@MarcoGorelli MarcoGorelli self-requested a review February 23, 2021 15:55
@MarcoGorelli
Copy link
Contributor

Cool, thanks @juanitorduz - looks good to me, I would just advise to add a prior predictive check (even though it's not the main focus of your notebook, it's probably something to encourage where possible)

@juanitorduz
Copy link
Collaborator Author

@MarcoGorelli I added a subsection on prior predictive checks. Let me know what you think about it 🤓 .

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 1, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-01T01:48:51Z
----------------------------------------------------------------

Should be az.summary, latest pymc3 has removed the aliases to arviz functions and won't work anymore


juanitorduz commented on 2021-03-02T09:05:52Z
----------------------------------------------------------------

this is fixed now.

@juanitorduz
Copy link
Collaborator Author

Thanks @OriolAbril ! I fixed it :)

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 1, 2021

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-03-01T10:35:52Z
----------------------------------------------------------------

Should be patsy, not pasty, right?


juanitorduz commented on 2021-03-02T09:06:26Z
----------------------------------------------------------------

this is fixed now.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 1, 2021

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-03-01T10:35:52Z
----------------------------------------------------------------

patsy?


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 1, 2021

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-03-01T10:35:53Z
----------------------------------------------------------------

but taking the mean -> by taking the mean


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 1, 2021

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-03-01T10:35:54Z
----------------------------------------------------------------

Missing space after period. "data set.We"


juanitorduz and others added 2 commits March 1, 2021 11:57
Co-authored-by: Thomas Wiecki <thomas.wiecki@gmail.com>
@juanitorduz
Copy link
Collaborator Author

Thank you @ricardoV94 I fixed your suggestions! (I always have issues naming patsy 😆 )!

@juanitorduz juanitorduz requested a review from twiecki March 1, 2021 11:52
Copy link
Member

twiecki commented Mar 2, 2021

typo still needs to be fixed.


View entire conversation on ReviewNB

Copy link
Member

twiecki commented Mar 2, 2021

Yeah I also don't get this point.


View entire conversation on ReviewNB

Copy link
Member

twiecki commented Mar 2, 2021

Yes, that's correct, needs to be fixed.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 2, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-03-02T09:00:29Z
----------------------------------------------------------------

typo: interaction


juanitorduz commented on 2021-03-02T09:11:48Z
----------------------------------------------------------------

fixed

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 2, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-03-02T09:00:30Z
----------------------------------------------------------------

Cell repeated.


juanitorduz commented on 2021-03-02T09:07:02Z
----------------------------------------------------------------

this is fixed now.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 2, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-03-02T09:00:30Z
----------------------------------------------------------------

I feel like the prior predictive check doesn't really add anything here and is rather confusing. I'd suggest to remove it.


juanitorduz commented on 2021-03-02T09:23:33Z
----------------------------------------------------------------

The initial version of the notebook did not have any prior predictive checks and I was suggested to add some. Should I remove it completely or do you have any ideas or suggestions on how to do it so that it fits the purpose of the example?

MarcoGorelli commented on 2021-03-02T14:22:51Z
----------------------------------------------------------------

Hey - sorry, the prior checks were my suggestion (was just trying to encourage them as part of the Bayesian workflow) but I have no objection to them being removed

ricardoV94 commented on 2021-03-02T15:19:49Z
----------------------------------------------------------------

I agree with having them removed. This makes it more straightforward to get to the point of this notebook.

juanitorduz commented on 2021-03-02T15:38:20Z
----------------------------------------------------------------

As there is no strong opinion to keep this section, I removed it from the notebook.

Copy link
Collaborator Author

Thanks, this is fixed now.


View entire conversation on ReviewNB

Copy link
Collaborator Author

Ok, maybe I needed to explain this better but this is really not relevant so I removed this confusing remark.


View entire conversation on ReviewNB

Copy link
Collaborator Author

this is fixed now


View entire conversation on ReviewNB

Copy link
Collaborator Author

this is fixed now.


View entire conversation on ReviewNB

3 similar comments
Copy link
Collaborator Author

this is fixed now.


View entire conversation on ReviewNB

Copy link
Collaborator Author

this is fixed now.


View entire conversation on ReviewNB

Copy link
Collaborator Author

this is fixed now.


View entire conversation on ReviewNB

Copy link
Collaborator Author

fixed


View entire conversation on ReviewNB

Copy link
Collaborator Author

you are right, I removed it.


View entire conversation on ReviewNB

Copy link
Collaborator Author

I added this reference into the resources section.


View entire conversation on ReviewNB

@juanitorduz
Copy link
Collaborator Author

Thank you all for your suggestions and catching my typos 🙈 (it is hard to catch them with a 1-year old learning to walk around you 😄 )!

Copy link
Collaborator Author

The initial version of the notebook did not have any prior predictive checks and I was suggested to add some. Should I remove it completely or do you have any ideas or suggestions on how to do it so that it fits the purpose of the example?


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 2, 2021

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-03-02T14:18:39Z
----------------------------------------------------------------

Is there a reason why you choose tho add Gaussian noise before the logistic transformation, instead of using the bernoulli noise? Are they actually equivalent? Some research suggests it is not. It's not a big deal, but it's not a logistic regression model anymore... This would be:

intercept =  - 0.5
beta_x1 = 1
beta_x2 = -1
beta_interaction = 2
z = intercept + beta_x1 * x1 + beta_x2 * x2 + beta_interaction * x1 * x2
p = 1 / (1 + np.exp(-z))
y = stats.bernoulli(p=p).rvs()

The intermediate variables (intercept, beta_x1, beta_x2, beta_interaction) are not relevant for the discussion. I proposed them because it may help readers to disentangle what is data and what are model parameters.

juanitorduz commented on 2021-03-02T15:16:01Z
----------------------------------------------------------------

Thank you for pointing this out! I actually just wanted to generate noisy data, but this remark makes sense. I changed it following your suggestion :)

Copy link
Contributor

Hey - sorry, the prior checks were my suggestion (was just trying to encourage them as part of the Bayesian workflow) but I have no objection to them being removed


View entire conversation on ReviewNB

@twiecki
Copy link
Member

twiecki commented Mar 2, 2021

@MarcoGorelli Yeah, when I read that I thought "this is a great suggestion" and I love prior checks but now seeing it implemented I don't think it really benefits much here and I can't think of a way to improve it, so I would argue to remove it again. What do you think?

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the true data model specification. It's not a blocker, but I would at least mention it in the notebook.

Copy link
Collaborator Author

Thank you for pointing this out! I actually just wanted to generate noisy data, but this remark makes sense. I changed it following your suggestion :)


View entire conversation on ReviewNB

Copy link
Member

I agree with having them removed. This makes it more straightforward to get to the point of this notebook.


View entire conversation on ReviewNB

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator Author

As there is no strong opinion to keep this section, I removed it from the notebook.


View entire conversation on ReviewNB

@twiecki twiecki merged commit 3593fe9 into pymc-devs:main Mar 8, 2021
@twiecki
Copy link
Member

twiecki commented Mar 8, 2021

Thanks @juanitorduz!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants