Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatic formulas with function calls #130

Closed
CameronBieganek opened this issue Jul 11, 2019 · 8 comments
Closed

Programmatic formulas with function calls #130

CameronBieganek opened this issue Jul 11, 2019 · 8 comments

Comments

@CameronBieganek
Copy link

With v0.6.x of StatsModels, it's now easier to programmatically create formulas. However, there doesn't appear to be an easy way to include function calls in those formulas. As a motivating example, I would like to be able to create a formula that includes a Fourier Series up to order n. In other words, I might want to make a function that operates as follows:

julia> make_fourier_formula(2) == @formula(y ~ cos(2x) + cos(4x) + sin(2x) + sin(4x))
true
@oxinabox
Copy link
Contributor

Best way to do this, right now, is with a custom term type.

https://juliastats.github.io/StatsModels.jl/stable/internals/#extending-1

@CameronBieganek
Copy link
Author

Ok, thanks. I haven't read that section of the docs in detail yet. However, it looks like that capability is targeted at package developers and not very easy to use for an average user who just wants to programmatically create a formula.

@kleinschmidt
Copy link
Member

I'd say this is right on the edge of package development level of complexity :) The underlying issue why this isn't easily doable with a function is that the "non-special" function calls like cos are handled by the @formula macro, which creates anonymous functions like x -> cos(2x). So doing this at function time would require using @eval.

@oxinabox has proposed an alternative to this general design in #117 but it's speculative and low on the priority list.

The examples in the docs for creating a poly term are actually pretty close to what you'd want here and I think a better bet. It's fairly straightforward but you can probably get most of the way by modifying that example (it's essentially an MWE for making custom terms). Actually, because I'm procrastinating, here's how I'd do it:

using StatsModels

using StatsModels: Schema

fourier(args...) = error("fourier can only be called in a @formula")

struct FourierTerm{T} <: AbstractTerm
    term::T
    deg::Int
end
FourierTerm(t, deg::ConstantTerm) = FourierTerm(t, deg.n)

function StatsModels.apply_schema(t::FunctionTerm{typeof(fourier)}, sch::Schema, Mod::Type)
    term, deg = t.args_parsed
    FourierTerm(apply_schema(term, sch, Mod), deg)
end

function StatsModels.modelcols(ft::FourierTerm, d::NamedTuple)
    col = modelcols(ft.term, d)
    cols = Float64[]
    for i in 1:ft.deg
        col .*= 2
        append!(cols, cos.(col))
        append!(cols, sin.(col))
    end
    return reshape(cols, :, 2*ft.deg)
end

StatsModels.width(ft::FourierTerm) = 2*ft.deg

function StatsModels.coefnames(ft::FourierTerm)
    termname = coefnames(ft.term)
    ["$sc($(2n)$termname)" for sc in ("cos", "sin") for n in 1:ft.deg]
end

Then you can do:

julia> d = DataFrame(y = rand(100), x = collect(range(0, π, length=100)));

julia> model = lm(@formula(y ~ fourier(x, 3)), d)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + FourierTerm{ContinuousTerm{Float64}}(x, 3)

Coefficients:
────────────────────────────────────────────────────────────────────────────────
                Estimate  Std. Error    t value  Pr(>|t|)   Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.479894     0.0299066  16.0464      <1e-27   0.420506   0.539283 
cos(2x)      -0.00607968   0.0420925  -0.144436    0.8855  -0.089667   0.0775076
cos(4x)      -0.0108691    0.0424953  -0.255772    0.7987  -0.0952563  0.0735181
cos(6x)       0.0247761    0.0420925   0.588612    0.5575  -0.0588112  0.108363 
sin(2x)      -0.0520148    0.0424953  -1.22401     0.2240  -0.136402   0.0323724
sin(4x)      -0.0488065    0.0420925  -1.15951     0.2492  -0.132394   0.0347808
sin(6x)       0.0615654    0.0424953   1.44876     0.1508  -0.0228218  0.145953 
────────────────────────────────────────────────────────────────────────────────

and if you want to see the model matrix that's generated:

julia> f = model.mf.f
FormulaTerm
Response:
  y(continuous)
Predictors:
  1
  FourierTerm{ContinuousTerm{Float64}}(x, 3)

julia> modelcols(f.rhs, Tables.columntable(first(d, 10)))
10×7 Array{Float64,2}:
 1.0  1.0       0.0        1.0       0.0        1.0        0.0     
 1.0  0.997987  0.0634239  0.991955  0.126592   0.967949   0.251148
 1.0  0.991955  0.126592   0.967949  0.251148   0.873849   0.486197
 1.0  0.981929  0.189251   0.928368  0.371662   0.723734   0.690079
 1.0  0.967949  0.251148   0.873849  0.486197   0.527225   0.849725
 1.0  0.950071  0.312033   0.80527   0.592908   0.29692    0.954902
 1.0  0.928368  0.371662   0.723734  0.690079   0.0475819  0.998867
 1.0  0.902927  0.429795   0.630553  0.776146  -0.204807   0.978802
 1.0  0.873849  0.486197   0.527225  0.849725  -0.444067   0.895994
 1.0  0.841254  0.540641   0.415415  0.909632  -0.654861   0.75575 

@oxinabox
Copy link
Contributor

I think we should move custom terms to there own page in the docs.
And cross reference to internals.
As A) internals sound scarry
B) that page is too long

@kleinschmidt
Copy link
Member

Agreed. Maybe could combine with the "programmatic construction" bits too since there's some overlap (or even split those into another page).

@CameronBieganek
Copy link
Author

@kleinschmidt Wow, thanks for the detailed example!

@kleinschmidt
Copy link
Member

this is now generally possible with #183 !

@kleinschmidt
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants