-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in handling of missing
for functions in @formula
call
#366
Comments
yes, that's because the missing is generated by |
This would imply we need another check to drop missings, right? |
yes, and this is one of the things that's held up a revamp of the model fitting API in StatsModels.jl (e.g. why we still have ModelFrame/ModelMatrix). one possiblity that's been discussed is to just not worry about dropping missings on the input side, and leave it up to the consumer to handle missings (e.g., allow StatsModels Terms to generate modelcols with missings). Then you'd just need one missing removal pass. |
The underlying problem is that some terms can introduce missings so you need to do at least one pass after generating the model cols. it seems kind of wasteful to generate a full matrix that's potentially |
I'm generally empathetic to having the user (or package like GLM) handle missings on their own. But StatsModels dropping |
Is there any update on this? How this been fixed upstream? |
I'm afraid not. It's related to JuliaStats/StatsModels.jl#153 and neither I nor @palday have had much extra bandwidth for that recently. |
But if you wanted to make a PR to StatsModels with a proposal for how and where to handle the dropping of missing values created by |
@kleinschmidt I ran into this again today. Are some of the internal fixes to StatsModels going to be able to resolve this? I know there is work to allow more customized handling of this stuff. |
A temporary kludge would be dropping missings from the resultant |
Yeah, that could work. But then we have to allocate that matrix twice. Would be nice to handle this upstream. |
Is there just no plan to fix this? The inability to use |
Constructing a lagged variable via
df.lag_wage = lag(df.wage)
and then running a regression works fine.It seems like GLM (or StatsModels) has not caught up to the new ability to specify transformations in the
@formula
call.The text was updated successfully, but these errors were encountered: