Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formula transfer updated to StatsModels v0.6.0 #317

Merged
merged 5 commits into from
Aug 17, 2019
Merged

Formula transfer updated to StatsModels v0.6.0 #317

merged 5 commits into from
Aug 17, 2019

Conversation

dmbates
Copy link
Member

@dmbates dmbates commented Jul 1, 2019

Currently some of the tests are broken because the StatsModels.@formula macro does not preserve the original expression passed to it. The test could be changed to compare, e.g.

RObject(@formula y ~ (a + b) * c)

to

R"y ~ a + b + c + a:b + b:c"

which would look a little peculiar or, my preferred solution, create an exorig field in a FormulaTerm like that in a FunctionTerm. @kleinschmidt prefers not to do that. See discussion in #306

@dmbates dmbates requested a review from simonbyrne July 1, 2019 19:12
@kleinschmidt
Copy link
Contributor

Any way we solve this (other than changing the tests) is going to require some fairly deep changes to StatsModels.

Keeping the original formula expression around is a little tricky, since the formula macro doesn't actually explicitly create a call to FormulaTerm, it creates a call to ~ with Terms as arguments.

If we want to make the formula more round-trippable (at least until apply_schema), one (admittedly extreme) solution would be to completely get rid of the syntactic transformations in the macro stage, and instead just create a "wrapped" version of the AST (sort of like what @oxinabox proposed for JuliaStats/StatsModels.jl#117, and JuliaStats/StatsModels.jl#119) where the symbols for called functions have been evaled into their actual function objects and other symbols are wrapped as Terms. And then handle the application of special handling of +, *, and & at the apply_schema stage.

@dmbates
Copy link
Member Author

dmbates commented Aug 5, 2019

bump. It would help to have this merged and a new release made.

@kleinschmidt
Copy link
Contributor

kleinschmidt commented Aug 6, 2019

What's blocking here? I don't think it's worth waiting for changes to StatsModels. It's just not going to be possible to have full round-trip-ability of formulae between R and Julia I'm afraid because of the way that non-special function calls are handled (e.g., you can't create a formula like @formula(y ~ 1 + log(x+y) at run time if log(x) gets converted to (x,y) -> log(x+y)). I stand corrected, with @eval you can do that I guess (which is what @dmbates has implemented here at my suggestion)

@dmbates
Copy link
Member Author

dmbates commented Aug 6, 2019

@kleinschmidt I don't think there is anything blocking. I think it just fell through the cracks. I didn't want to go ahead and merge it without @simonbyrne or @randy3k taking a look at it first and I imagine they are both busy. I'm afraid I let it fall off of my ToDo list to remind them about it.

@dmbates
Copy link
Member Author

dmbates commented Aug 6, 2019

@kleinschmidt I don't think anything is blocking here. I didn't want to merge without @simonbyrne or @randy3k taking a look and they are both busy, I assume. I know Simon started a new job in the last few months. I'm afraid I let prodding them slip off my ToDo list and I am trying to resurrect this.

@dmbates
Copy link
Member Author

dmbates commented Aug 6, 2019

Sorry for saying the same thing twice. The first comment didn't appear initially so I thought I had somehow deleted it instead of hitting "Comment".

src/RCall.jl Outdated Show resolved Hide resolved
Formula(ex_orig, ex, lhs, rhs)
function rcopy(::Type{FormulaTerm}, l::Ptr{LangSxp})
expr = rcopy(Expr, l)
@eval StatsModels.@formula($expr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't eval inside a function. Is there another interface we can use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not. At least not if we want to support calls to functions that are not part of the formula language (like log).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that eval is called at top-level, so this probably won't actually work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tempted to merge this PR and get a new version of RCall on the registry, even if it is likely that this will not work perfectly. Right now you can't install RCall if you want to have recent versions of GLM, MixedModels, etc. installed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does work to call @eval in this case. At least the following simple test seems okay.

julia> rcopy(R"y ~ 1 + A + B + log(C)")
FormulaTerm
Response:
  y(unknown)
Predictors:
  1
  A(unknown)
  B(unknown)
  (C)->log(C)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is if it works correctly from inside a function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deeply inside a function do you need?

julia> using RCall

R> form <- y ~ 1 + A + B + log(C)

julia> function foo()
           rcopy(R"form")
       end
foo (generic function with 1 method)

julia> foo()
FormulaTerm
Response:
  y(unknown)
Predictors:
  1
  A(unknown)
  B(unknown)
  (C)->log(C)

@dmbates
Copy link
Member Author

dmbates commented Aug 15, 2019

Any objections to my merging this? The CI failures are caused by the inability to install the R packages on a Mac for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants