-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Programmatic formulas with function calls #130
Comments
Best way to do this, right now, is with a custom term type. https://juliastats.github.io/StatsModels.jl/stable/internals/#extending-1 |
Ok, thanks. I haven't read that section of the docs in detail yet. However, it looks like that capability is targeted at package developers and not very easy to use for an average user who just wants to programmatically create a formula. |
I'd say this is right on the edge of package development level of complexity :) The underlying issue why this isn't easily doable with a function is that the "non-special" function calls like @oxinabox has proposed an alternative to this general design in #117 but it's speculative and low on the priority list. The examples in the docs for creating a using StatsModels
using StatsModels: Schema
fourier(args...) = error("fourier can only be called in a @formula")
struct FourierTerm{T} <: AbstractTerm
term::T
deg::Int
end
FourierTerm(t, deg::ConstantTerm) = FourierTerm(t, deg.n)
function StatsModels.apply_schema(t::FunctionTerm{typeof(fourier)}, sch::Schema, Mod::Type)
term, deg = t.args_parsed
FourierTerm(apply_schema(term, sch, Mod), deg)
end
function StatsModels.modelcols(ft::FourierTerm, d::NamedTuple)
col = modelcols(ft.term, d)
cols = Float64[]
for i in 1:ft.deg
col .*= 2
append!(cols, cos.(col))
append!(cols, sin.(col))
end
return reshape(cols, :, 2*ft.deg)
end
StatsModels.width(ft::FourierTerm) = 2*ft.deg
function StatsModels.coefnames(ft::FourierTerm)
termname = coefnames(ft.term)
["$sc($(2n)$termname)" for sc in ("cos", "sin") for n in 1:ft.deg]
end Then you can do: julia> d = DataFrame(y = rand(100), x = collect(range(0, π, length=100)));
julia> model = lm(@formula(y ~ fourier(x, 3)), d)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
y ~ 1 + FourierTerm{ContinuousTerm{Float64}}(x, 3)
Coefficients:
────────────────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.479894 0.0299066 16.0464 <1e-27 0.420506 0.539283
cos(2x) -0.00607968 0.0420925 -0.144436 0.8855 -0.089667 0.0775076
cos(4x) -0.0108691 0.0424953 -0.255772 0.7987 -0.0952563 0.0735181
cos(6x) 0.0247761 0.0420925 0.588612 0.5575 -0.0588112 0.108363
sin(2x) -0.0520148 0.0424953 -1.22401 0.2240 -0.136402 0.0323724
sin(4x) -0.0488065 0.0420925 -1.15951 0.2492 -0.132394 0.0347808
sin(6x) 0.0615654 0.0424953 1.44876 0.1508 -0.0228218 0.145953
──────────────────────────────────────────────────────────────────────────────── and if you want to see the model matrix that's generated: julia> f = model.mf.f
FormulaTerm
Response:
y(continuous)
Predictors:
1
FourierTerm{ContinuousTerm{Float64}}(x, 3)
julia> modelcols(f.rhs, Tables.columntable(first(d, 10)))
10×7 Array{Float64,2}:
1.0 1.0 0.0 1.0 0.0 1.0 0.0
1.0 0.997987 0.0634239 0.991955 0.126592 0.967949 0.251148
1.0 0.991955 0.126592 0.967949 0.251148 0.873849 0.486197
1.0 0.981929 0.189251 0.928368 0.371662 0.723734 0.690079
1.0 0.967949 0.251148 0.873849 0.486197 0.527225 0.849725
1.0 0.950071 0.312033 0.80527 0.592908 0.29692 0.954902
1.0 0.928368 0.371662 0.723734 0.690079 0.0475819 0.998867
1.0 0.902927 0.429795 0.630553 0.776146 -0.204807 0.978802
1.0 0.873849 0.486197 0.527225 0.849725 -0.444067 0.895994
1.0 0.841254 0.540641 0.415415 0.909632 -0.654861 0.75575 |
I think we should move custom terms to there own page in the docs. |
Agreed. Maybe could combine with the "programmatic construction" bits too since there's some overlap (or even split those into another page). |
@kleinschmidt Wow, thanks for the detailed example! |
this is now generally possible with #183 ! |
...with a more specific pointer in tests added in #274 : https://github.com/JuliaStats/StatsModels.jl/pull/274/files#diff-3dc4ae230d881b69f2f59d25f1fe89dc585f9dcb6f65e37009455c40642fb1d3R23-R28 |
With v0.6.x of StatsModels, it's now easier to programmatically create formulas. However, there doesn't appear to be an easy way to include function calls in those formulas. As a motivating example, I would like to be able to create a formula that includes a Fourier Series up to order
n
. In other words, I might want to make a function that operates as follows:The text was updated successfully, but these errors were encountered: