Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An explictly overloaded FunctionalTerm used in the arguments to julia function used in a formula is always treated as a julia function. #114

Open
oxinabox opened this issue Jun 13, 2019 · 4 comments

Comments

@oxinabox
Copy link
Contributor

Basically:
f = @formula(y ~ (x - poly(x, 2))) calls poly on the vector of numerical values x,
rather than using the appropriate modelcols(::PolyTerm, data)
to workout what to do.

MVE

on current master
The code at the start is just like in
https://juliastats.github.io/StatsModels.jl/latest/internals/

except that the function behind poly is defined to throw an error,
as we want to make sure it is not used.

Turns out is is, and as such an error is thrown

using StatsModels

using StatsBase
# syntax: best practice to define a _new_ function
poly(x, n) = error("poly term should not be invoked directly, only used within a @formula")

# type of model where syntax applies: here this applies to any model type
const POLY_CONTEXT = Any

# struct for behavior
struct PolyTerm <: AbstractTerm
    term::ContinuousTerm
    deg::Int
end

Base.show(io::IO, p::PolyTerm) = print(io, "poly($(p.term), $(p.deg))")

function StatsModels.apply_schema(t::FunctionTerm{typeof(poly)}, sch, Mod::Type{<:POLY_CONTEXT})
    term = apply_schema(t.args_parsed[1], sch, Mod)
    isa(term, ContinuousTerm) ||
        throw(ArgumentError("PolyTerm only works with continuous terms (got $term)"))
    deg = t.args_parsed[2]
    isa(deg, ConstantTerm) ||
        throw(ArgumentError("PolyTerm degree must be a number (got $deg)"))
    PolyTerm(term, deg.n)
end

function StatsModels.modelcols(p::PolyTerm, d::NamedTuple)
    col = modelcols(p.term, d)
    reduce(hcat, [col.^n for n in 1:p.deg])
end

StatsModels.width(p::PolyTerm) = p.deg

StatsBase.coefnames(p::PolyTerm) = coefnames(p.term) .* "^" .* string.(1:p.deg)

#####################################

df = (y=1:10, x = 1:10)
f = @formula(y ~ (x - poly(x, 2)))
f = apply_schema(f, schema(f, df))
modelcols(f, df);

outputs

julia> modelcols(f, df);
ERROR: poly term should not be invoked directly, only used within a @formula
@oxinabox
Copy link
Contributor Author

Ok,
I looked into fixing this today.

Basically we need to move some of the stuff that is done parse! to be done during apply_schema.

One way to do that would be for the default FunctionTerm without special-casing
to define apply_schema to operate recursively on its arguments.

Another option that is closely related and might be part of the above is to have two kinds of FunctionTerms.
I am going to say:

  • CallTerm
    - This represents the fact that some thing julia parses as a Expr(:call,... showed up in the Formula
    - which is created during parse!,
    - it acts like current FunctionTerm does during apply_schema
  • FunctionCallTerm
    • It represents the fact that we are actually going to call a julia function during modelcols
    • which is created out of a CallTerm during apply_schema, if there is no overload specified (Otherwise the CallTerm gets resolved into a LeadLagTerm or a PolyTerm etc is created)
    • it acts like current FunctionTerm does during modelcols

I think this will do it, but one thing I am cacuous about is making sure things like:
poly(poly(x, 2), 2) work correctly.
I think all that is needed for that is for apply_schma to be applied form the inside out.
on CallTerms.
but I am caucious.

Here are my testcases:

x - poly(x, 1)
poly(poly(x, 1), 1)
poly(x-poly(x, 1), 1)
identity(2*x)

@oxinabox
Copy link
Contributor Author

oxinabox commented Jun 13, 2019

I thought as a hack the followomg would work but it appears not to.
It both doesn't solve the problem and it breaks
other things.

function apply_schema(t::FunctionTerm{Fo, Fa, names}, schema, Mod::Type) where {Fo, Fa, names}
    args_schemaed = map(t.args_parsed) do subterm
        apply_schema(subterm, schema, Mod)
    end
    return FunctionTerm{Fo, Fa, names}(t.forig, t.fanon, t.exorig, args_schemaed)
end

@oxinabox
Copy link
Contributor Author

Ok it could be just - that is broken.
I didn't realised it had special handling

@kleinschmidt
Copy link
Member

kleinschmidt commented Jun 19, 2019

I want to add that this choice was intentional, so it's not quite right to say
that this happens incorrectly. It's just the current implementation of
arbitrary functions blocks all special formula syntax below the top call
encountered. Whether or not that's what we ultimately want is up for debate
(#117 ) but it's not exactly incorrect :)

@oxinabox oxinabox changed the title An explictly overloaded FunctionalTerm used in the arguments to julia function used in a formula is (incorrectly) treated as a julia function. An explictly overloaded FunctionalTerm used in the arguments to julia function used in a formula is always treated as a julia function. Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants