-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HypothesisCoding and ~un-export~ deprecate ContrastsCoding #158
Conversation
Codecov Report
@@ Coverage Diff @@
## master #158 +/- ##
==========================================
+ Coverage 86.11% 86.47% +0.35%
==========================================
Files 9 9
Lines 497 510 +13
==========================================
+ Hits 428 441 +13
Misses 69 69
Continue to review full report at Codecov.
|
I've also implemented sequential difference/sliding/repeated contrasts coding for the hell of it (see #120) |
Contrasts are derived the hypothesis matrix by taking the pseudoinverse: | ||
|
||
```jldoctest hyp | ||
julia> sdiff_contrasts = pinv(sdiff_hypothesis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> sdiff_contrasts = pinv(sdiff_hypothesis) | |
julia> pinv(sdiff_hypothesis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used anywhere but it is more to help with reading...
|
||
""" | ||
mutable struct HypothesisCoding <: AbstractContrasts | ||
hypotheses::Matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be parameterized on the eltype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, although it's going to get converted to whatever the eltype of the final model matrix is anyway. But maybe it would help the compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if the compiler can know the concrete type of such objects in typical situations (which I think is the case), it will generate much more efficient code. That should matter especially if you iterate over rows.
Though I've noticed this affects other contrasts, so maybe for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only affect this and ContrastsCoding
(the only other one that involves a manually specified matrix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And anyway, I don't think the contrasts themselves ever get used in any hot code paths (that's all the ContrastsMatrix
struct...)
base::Nothing | ||
levels::Union{Vector,Nothing} | ||
|
||
function HypothesisCoding(hypotheses, base, levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specify base
if it's always nothing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's code that tries to access the field. I could overload getproperty for this type instead though, or refactor that code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since HypothesisCoding
is new, how could existing code try to access that field? :-)
Anyway, in general I think fields are considered private unless specified otherwise. And unexporting ContrastsCoding
is much more breaking. BTW, maybe we should keep exporting it for now until we want to make another breaking release? It's simpler to break many things at the same time than force people to update small things continuously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Let's keep it exported but maybe add a warning to teh constructor or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a depwarn because the deprecations weren't quite doing it correctly (giving a spurious method overwritten warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base field is used in the constructor of ContrastsMatrix
. Could change that to use an accessor method but that seems like overkill at this point.
src/contrasts.jl
Outdated
Specify how to code a categorical variable in terms of a *hypothesis matrix*. | ||
For a variable with ``k`` levels, this should be a ``k-1 \times k`` matrix. | ||
Each row of the hypothesis corresponds to the hypothesis tested by the | ||
corresponding predictor, given as the weights given to the "cell mean" of each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand that phrasing. The repetition of "given" doesn't help. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "expressed as the weights of the "cell mean" of each of the k
values of the predictor"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I guess I don't really understand what "cell means" refers to in this context, but maybe that's OK.
For non-breaking-ness, perhaps it would be better to deprecate ContrastsCoding with a message that it will be un-exported in the next minor release? Rather than removing it right now. |
Yes, that sounds better, but I always forget whether that's possible or not. We could also just keep exporting it for now, and only remove it in the next minor release (when we want to break other, more fundamental things). |
src/contrasts.jl
Outdated
Specify how to code a categorical variable in terms of a *hypothesis matrix*. | ||
For a variable with ``k`` levels, this should be a ``k-1 \times k`` matrix. | ||
Each row of the hypothesis corresponds to the hypothesis tested by the | ||
corresponding predictor, given as the weights given to the "cell mean" of each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I guess I don't really understand what "cell means" refers to in this context, but maybe that's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from the suggestion.
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
a9743c2
to
3fc0a8b
Compare
ContrastsCoding
is a footgun unless you really know what you're doing.Hypothesis coding is a more natural way of specifying manual contrasts, since it
directly expresses differences between conditional means of levels in the matrix
(which is the pseudo-inverse of the contrasts matrix). This PR introduces a
HypothesisCoding <: AbstractContrasts
which is exported, and un-exportsContrastsCoding
(and adds a note in the docs/help text that points tohypotheses)