`posterior_linpred()` for ordinal families: argument for taking the intercept into account #1137

fweber144 · 2021-04-12T11:08:52Z

This PR introduces an argument (incl_thres) to posterior_linpred() for taking the intercept into account (in case of an ordinal family) which is required for the augmented-data approach in projpred.

A different topic: In line

brms/R/posterior_epred.R

Line 106 in 76fcc83

out <- get_dpar(object, dpar = dpar, ilink = TRUE)

I don't quite understand why ilink = TRUE is used there (since scale = "linear" should also be possible there, right?). But I guess it's correct and I just don't get it. In case it's not correct: Do you want me to open an issue?

… into account (in case of an ordinal family) which is required for the augmented-data approach in projpred.

fweber144 · 2021-04-12T11:17:20Z

Of course, you can delete the comment in this line if you agree that using sweep() with its recycling checks is preferable.

paul-buerkner · 2021-04-12T20:07:43Z

thanks for working on this issue! I will to make some edits to the PR before merging so it may take a couple of days. the line you mentioned it correct as is. Frank Weber ***@***.***> schrieb am Mo., 12. Apr. 2021, 13:17:

…

Of course, you can delete the comment in this line <https://github.com/fweber144/brms/blob/0a68dae163126b58beeef8fea008b048c1b1edcb/R/posterior_epred.R#L169> if you agree that using sweep() with its recycling checks is preferable. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2ABHYTJEDV6JQFBUP3LTILJFLANCNFSM42ZCFZOQ> .

fweber144 · 2021-04-13T06:12:09Z

Sure

paul-buerkner · 2021-04-29T13:36:37Z

Thanks again for our call yesterday. I have taken the liberty to push my changes directly to this PR so that you can continue working on it as well. I have implemented the basic functionality for the envisioned incl_thres feature.

There is some more work to do, which I don't have time for at the moment and I was hoping that perhaps you can continue with it, if you like and have time. Specifically, there are the following TODOs remaining:

Refactor the remaining d<ordinal-family> functions following what I have done with dcumulative in distributions.R.
Implement the (extended) link functions corresponding to the (extended) inverse link functions. This requires to do some math first because I don't know if anybody has actually written out these extended links already somewhere. The math shouldn't be too complicated though.

…d().

fweber144 · 2021-04-30T13:18:32Z

Thanks a lot! The basic case works like a charm (see the test added in tests/local/tests.models_new.R). However, I have found two cases which probably need special handling:

Specifying a formula for the distributional parameter disc (e.g., disc ~ 1).
Grouped thresholds (<...> | thres(th, gr) ~ <...>).

I have started tests for these two special cases in tests/local/tests.models_new.R but of course, those two tests still fail. I can have a look if I can fix these special cases.

I'll tackle the TODOs asap.

paul-buerkner · 2021-04-30T13:36:56Z

Thanks! My implementation should handle these specials cases already. In fact these special cases are one reason my implementation is set up in the way it is. Please let me know if you find out why its not working and I am happy to fix it.

fweber144 · 2021-04-30T13:45:15Z

Sure. I'm realizing that I forgot to take disc into account when performing the check. My bad, sorry.

…with `disc ~ 1`).

… categories with NA, not zero.

fweber144 · 2021-04-30T15:02:42Z

I fixed the disc ~ 1 test. However, the second special case (grouped thresholds) remains to discuss. As explained in my original corresponding unit test (commit 8e7f4fe), the unmatching group/threshold combinations seem to be assigned a value of zero which is probably misleading. It might be better to replace those zeros with NAs or use a completely different structure (e.g. a list, as in the original check for that unit test). If both is not an option, perhaps disallow grouped thresholds for incl_thres = TRUE. In commit b690b75, I have now replaced the zeros by NAs but I don't know if that breaks other code (which could rely on zeros).

paul-buerkner · 2021-04-30T15:19:11Z

Good catch! The reason things are 0 is because the probability of them is 0. NA should be fine as well, but perhaps this can be done conditionally, that is NA for identity link and 0 otherwise?

fweber144 · 2021-04-30T15:37:08Z

Good idea, that should work.

…ntity" link, fill missing categories with NAs. Otherwise, fill with zeros.

paul-buerkner · 2021-05-04T08:32:50Z

thanks for checking it out! I switched to array syntax so that we can handle the 3D linpred array directly and efficiently in these link functions. I think that would make it easier down the line. if possible it would be great to have that working for the other families as well. Frank Weber ***@***.***> schrieb am Di., 4. Mai 2021, 10:17:

…

Other question: In commit 837a8e5 <837a8e5>, you switched from matrix syntax to a more general array syntax. Do you want me to do so also for the other ordinal families? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2AEULO7JO52KZSMQVSLTL6UQZANCNFSM42ZCFZOQ> .

fweber144 · 2021-05-04T09:36:10Z

Ah, yeah that's a good idea. I was too focused on the former matrix syntax to see how using arrays in the inv_link_<...>() functions will make life easier.

fweber144 · 2021-05-04T09:39:09Z

To get back to the comment from above:

Did you check already if the sratio and cratio families yield the same results when run in brms with a symmetric link function

No, I haven't yet. I'll do so (and perhaps even add this as a unit test).

Families sratio() and cratio() indeed give the same results for symmetric link functions. I'll add unit tests for that. But I think it's better to create a new PR for that so we don't mix up too many different things here.

paul-buerkner · 2021-05-04T09:42:03Z

Thanks. Glad to hear things are correctly implemented in brms. And indeed, we can worry about the unit test later and separately.

…arrays.

fweber144 · 2021-05-05T10:54:45Z

Here are the commits with which I would consider this PR to be finished. But of course, if you want me to add or change something, that's no problem. Some notes:

I added some internal documentation. I'm not 100% sure it's always correct, so better check it.
I basically only "translated" the inv_link_<ordinal_family>() functions from matrix syntax to array syntax. However, I think my unit tests for these inv_link_<ordinal_family>() functions should provide a more efficient implementation since they are vectorized as much as possible. If you want, I can swap the two implementations (the original one from the functions and my new one from the unit tests) in a new PR. Of course, I would then also generalize my implementation from the unit tests to arrays with more than 3 dimensions.
As mentioned in a comment of commit 8e7f4fe (these lines), I'm currently re-using the object fit from the previous test. I did so to save computing time when running the tests. However, it breaks the self-containment of that test which is probably bad practice.
As mentioned in this comment, I'll create a separate PR for testing the equivalence of the sratio() and the cratio() family in case of symmetric distribution functions.
The equivalence of the sratio() and the cumulative() (not cratio()) family in case of the "cloglog" link (see Appendix A of this paper) does not seem to hold in brms. That needs to be checked and when clarified, it should probably be included in the PR for the equivalence of the sratio() and the cratio() family.
I have already started some code and the mathematical derivation of the link functions. However, as the link functions are not needed for the new argument incl_thres, I would create a new PR for that if you're OK with it. I also have some other projpred-related code for brms which I would then include in that PR. I think it's best to use the current PR only for argument incl_thres and changes directly related to it.

paul-buerkner · 2021-05-05T12:48:15Z

Great, thank you so much!

I will check the doc before merging.
I think it is sufficient if the inv_link functions work for 1D to 3D arrays. We can swap the implementations but only if the speed differences is large enough to make it worthwhile. Did you investigate speed differences already? Personally, I am find with just keeping it in the current way for this PR. We can still change it later on once we have the speed evaluation done.
I will edit to make it self-contained.
Yes, we can do this in a separate PR but we can also choose not to do this at all right now since its not in focus at the moment.
Let's ignore this at the moment. I got this equivalence statement from another paper and didn't check it myself in detail. I don't think its worth investigating at the moment.
Sounds good!

paul-buerkner · 2021-05-05T13:32:11Z

I am running checks now. Once they pass, I will merge this PR. Thank you again for working on it!

fweber144 · 2021-05-05T13:37:34Z

Great, thanks. And I'm glad to help. Concerning

We can swap the implementations but only if the speed differences is large enough to make it worthwhile. Did you investigate speed differences already? Personally, I am find with just keeping it in the current way for this PR. We can still change it later on once we have the speed evaluation done.

I will perform such a speed comparison when I get the time.

fweber144 · 2021-05-05T13:43:08Z

Concerning commit 5c2cdb4: I'm just realizing that you can probably replace none_cat_dims by marg_othdim.

paul-buerkner · 2021-05-05T13:51:34Z

Good catch. Fixed it and will merge now.

…ive() and sratio() for the cloglog link (see <paul-buerkner#1137 (comment)>).

fweber144 · 2021-05-06T12:34:34Z

Great, thanks. And I'm glad to help. Concerning

We can swap the implementations but only if the speed differences is large enough to make it worthwhile. Did you investigate speed differences already? Personally, I am find with just keeping it in the current way for this PR. We can still change it later on once we have the speed evaluation done.

I will perform such a speed comparison when I get the time.

You can now find the speed comparison in PR #1155.

fweber144 added 3 commits April 12, 2021 12:50

Introduce an argument to posterior_linpred() for taking the intercept…

a609e11

… into account (in case of an ordinal family) which is required for the augmented-data approach in projpred.

Add a NEWS entry.

fcd2c21

Insert GitHub PR number.

0a68dae

Merge branch 'master' into projpred_augdat

40e43ac

fweber144 and others added 6 commits April 14, 2021 10:54

Merge branch 'master' into projpred_augdat

ec4126c

Merge branch 'master' into projpred_augdat

448e813

Merge branch 'master' into projpred_augdat

7b696c2

add 'slice' function

de47fdd

refactor 'dcumulative'

837a8e5

update implementation of 'incl_thres'

3a431d8

paul-buerkner and others added 3 commits April 29, 2021 16:34

fix typo

63ba13e

Re-indent tests/local/tests.models_new.R

827591f

Add (preliminary) tests for argument incl_thres of posterior_linpre…

8e7f4fe

…d().

fweber144 added 4 commits April 30, 2021 16:07

Fix a test for argument incl_thres of posterior_linpred() (the one …

4adac19

…with `disc ~ 1`).

Remove an unnecessary check.

089e506

Fix a typo.

d689fd7

posterior_epred_ordinal() in case of grouped thresholds: Fill missing…

b690b75

… categories with NA, not zero.

fweber144 added 3 commits May 1, 2021 10:29

Merge branch 'master' into projpred_augdat

71eb8eb

posterior_epred_ordinal() in case of grouped thresholds: For the "ide…

8eb1157

…ntity" link, fill missing categories with NAs. Otherwise, fill with zeros.

Replace remaining extract_col() occurrences by slice_col().

650b732

fweber144 added 9 commits May 4, 2021 12:05

Internally document dcumulative() and inv_link_cumulative().

5c1c2e3

In inv_link_cumulative(): Overwrite x.

00148e1

Create and use inv_link_sratio().

64080d3

Create and use inv_link_cratio().

1da7f7e

Create and use inv_link_acat().

b0bde2f

Test that d<ordinal_family>() works correctly.

5b4b3f9

Add argument drop to slice().

5fa503e

inv_link_sratio(), inv_link_cratio(), and inv_link_acat(): Allow for …

aadc683

…arrays.

Test that inv_link_<ordinal_family>() works correctly for arrays.

d7e50bf

paul-buerkner added 3 commits May 5, 2021 15:25

minor cleaning

5c2cdb4

add frank as contributor

2438629

some more minor cleaning

68b4fe7

more cleaning

c841e36

paul-buerkner merged commit c545d81 into paul-buerkner:master May 5, 2021

fweber144 added a commit to fweber144/brms that referenced this pull request May 5, 2021

Separate out and comment out the tests for the equivalence of cumulat…

d5242a4

…ive() and sratio() for the cloglog link (see <paul-buerkner#1137 (comment)>).

fweber144 deleted the projpred_augdat branch May 5, 2021 14:41

This was referenced May 5, 2021

Tests for equivalence of sratio() and cratio() in case of symmetric distribution functions #1153

Merged

Improve efficiency of the inv_link_<ordinal_family>() functions #1155

Merged

fweber144 mentioned this pull request May 14, 2021

Link functions, inverse-link function for categorical(), other preparations for augmented-data projection #1159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`posterior_linpred()` for ordinal families: argument for taking the intercept into account #1137

`posterior_linpred()` for ordinal families: argument for taking the intercept into account #1137

fweber144 commented Apr 12, 2021 •

edited

Loading

fweber144 commented Apr 12, 2021

paul-buerkner commented Apr 12, 2021 via email

fweber144 commented Apr 13, 2021

paul-buerkner commented Apr 29, 2021

fweber144 commented Apr 30, 2021

paul-buerkner commented Apr 30, 2021

fweber144 commented Apr 30, 2021

fweber144 commented Apr 30, 2021 •

edited

Loading

paul-buerkner commented Apr 30, 2021

fweber144 commented Apr 30, 2021 via email •

edited

Loading

paul-buerkner commented May 4, 2021 via email

fweber144 commented May 4, 2021 •

edited

Loading

fweber144 commented May 4, 2021 •

edited

Loading

paul-buerkner commented May 4, 2021

fweber144 commented May 5, 2021

paul-buerkner commented May 5, 2021 •

edited

Loading

paul-buerkner commented May 5, 2021

fweber144 commented May 5, 2021

fweber144 commented May 5, 2021

paul-buerkner commented May 5, 2021

fweber144 commented May 6, 2021

posterior_linpred() for ordinal families: argument for taking the intercept into account #1137

posterior_linpred() for ordinal families: argument for taking the intercept into account #1137

Conversation

fweber144 commented Apr 12, 2021 • edited Loading

fweber144 commented Apr 12, 2021

paul-buerkner commented Apr 12, 2021 via email

fweber144 commented Apr 13, 2021

paul-buerkner commented Apr 29, 2021

fweber144 commented Apr 30, 2021

paul-buerkner commented Apr 30, 2021

fweber144 commented Apr 30, 2021

fweber144 commented Apr 30, 2021 • edited Loading

paul-buerkner commented Apr 30, 2021

fweber144 commented Apr 30, 2021 via email • edited Loading

paul-buerkner commented May 4, 2021 via email

fweber144 commented May 4, 2021 • edited Loading

fweber144 commented May 4, 2021 • edited Loading

paul-buerkner commented May 4, 2021

fweber144 commented May 5, 2021

paul-buerkner commented May 5, 2021 • edited Loading

paul-buerkner commented May 5, 2021

fweber144 commented May 5, 2021

fweber144 commented May 5, 2021

paul-buerkner commented May 5, 2021

fweber144 commented May 6, 2021

`posterior_linpred()` for ordinal families: argument for taking the intercept into account #1137

`posterior_linpred()` for ordinal families: argument for taking the intercept into account #1137

fweber144 commented Apr 12, 2021 •

edited

Loading

fweber144 commented Apr 30, 2021 •

edited

Loading

fweber144 commented Apr 30, 2021 via email •

edited

Loading

fweber144 commented May 4, 2021 •

edited

Loading

fweber144 commented May 4, 2021 •

edited

Loading

paul-buerkner commented May 5, 2021 •

edited

Loading