Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method ambiguity for lead/lag terms #146

Closed
cpfiffer opened this issue Aug 29, 2019 · 8 comments · Fixed by #152
Closed

Method ambiguity for lead/lag terms #146

cpfiffer opened this issue Aug 29, 2019 · 8 comments · Fixed by #152

Comments

@cpfiffer
Copy link

cpfiffer commented Aug 29, 2019

I'm getting a method error on schema application when using a lead/lag term. I didn't think this was a GLM thing, but I'm happy to move this over there if it is a GLM thing.

EDIT: I slimmed the MWE down even more.

MWE:

using DataFrames, StatsModels, GLM

df = DataFrame(a = randn(10), b=randn(10))
f = @formula(a ~ 1 + lag(b, 1))
fitted = lm(f, df)

Error:

ERROR: MethodError: StatsModels.apply_schema(::FunctionTerm{typeof(lag),getfield(Main, Symbol("##383#384")),(:b,)}, ::StatsModels.FullRank, ::Type{LinearModel}) is ambiguous. Candidates:
  apply_schema(t::AbstractTerm, schema::StatsModels.FullRank, Mod::Type) in StatsModels at /home/cameron/.julia/packages/StatsModels/G9zlM/src/schema.jl:313
  apply_schema(t::FunctionTerm{F,Fanon,Names} where Names where Fanon, sch, ctx::Type) where F<:Union{typeof(lag), typeof(lead)} in StatsModels at /home/cameron/.julia/packages/StatsModels/G9zlM/src/temporal_terms.jl:30
Possible fix, define
  apply_schema(::FunctionTerm{F<:Union{typeof(lag), typeof(lead)},Fanon,Names} where Names where Fanon, ::StatsModels.FullRank, ::Type)
Stacktrace:
 [1] _broadcast_getindex_evalf(::typeof(apply_schema), ::FunctionTerm{typeof(lag),getfield(Main, Symbol("##383#384")),(:b,)}, ::StatsModels.FullRank, ::Type) at ./broadcast.jl:578
 [2] _broadcast_getindex at ./broadcast.jl:551 [inlined]
 [3] (::getfield(Base.Broadcast, Symbol("##19#20")){Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,typeof(apply_schema),Tuple{Tuple{ConstantTerm{Int64},FunctionTerm{typeof(lag),getfield(Main, Symbol("##383#384")),(:b,)}},Base.RefValue{StatsModels.FullRank},Base.RefValue{Type{LinearModel}}}}})(::Int64) at ./broadcast.jl:953
 [4] ntuple at ./tuple.jl:160 [inlined]
@kleinschmidt
Copy link
Member

Bummer. These come up more than I'd like :(

@kleinschmidt
Copy link
Member

So #152 fixes the method ambiguity but there's another problem which is that GLM expects a model matrix that has no missings so your MWE still raises an error. I'm not quite sure what the best way of handling this is. Do you have any ideas @oxinabox ?

@cpfiffer
Copy link
Author

cpfiffer commented Sep 13, 2019

I think that might be a GLM issue though, it should probably have some ability to handle matrices with missing values.

@oxinabox
Copy link
Contributor

I feel like we do need a way with getting rid of missings.

Possibly via Impute.jl

It is reasonable to have missings in data and for things like lead and lag to introduce more of them.

It is also reasonable for GLM to not accept them.

@kleinschmidt
Copy link
Member

It would be cool if we could figure out a way to provide what ModelFrame used to provide (a mask of which rows in the underlying data frame the model matrix corresponds to) in this new framework. Maybe some notion of pre- and post- transformers that the terms could provide. Then you'd go through and pull out all the transformers, and terms like lead and lag could say "okay drop these rows from the output because they are undefined" in a way that the modeling consumer could take or leave. Because it's important I think in most workflows to know which bits of the model matrix correspond to which bits of the underlying data.

@kleinschmidt
Copy link
Member

I think maybe MLJ has some kind of system for representing these transforms?

@greimel
Copy link

greimel commented Jul 10, 2020

Is there an issue open about lead/lag not working with GLM.jl due to the introduction of missings?

@kleinschmidt
Copy link
Member

There is in GLM.jl, it's JuliaStats/GLM.jl#366. The closest thing here is #153 but that doesn't address the issue of how to drop the rows with missings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants