Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for sparse arrays and views #446

Merged
merged 9 commits into from
Jan 24, 2022
Merged

Allow for sparse arrays and views #446

merged 9 commits into from
Jan 24, 2022

Conversation

andreasnoack
Copy link
Member

This is an alternative to #443 as discussed in that PR. I'm converting the inputs to Vector since we don't really support anything else even though the structs are parametric. Eventually, we can consider loosening this conversion if we want to support e.g. CuArrays but it would have to be a more extensive rewrite to ensure that things work and that no slow paths are hit in important cases so, for now, I think this PR is the best way forward. cc @bkamins

src/glmfit.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Contributor

bkamins commented Sep 6, 2021

The CI failures indicate that we use different code paths for actual computations with and without views (the differences are roudoff errors). This is surprising. What is the reason?

@bkamins
Copy link
Contributor

bkamins commented Sep 6, 2021

In general I am OK to close my PR and finalize this. Thank you!

@andreasnoack
Copy link
Member Author

This is surprising. What is the reason?

I agree. I expected them to take the same code path and therefore be exactly equal. I'll try figure out what is going on.

@bkamins
Copy link
Contributor

bkamins commented Jan 20, 2022

@andreasnoack - what is still needed in this PR to merge it?

src/lm.jl Outdated Show resolved Hide resolved
src/lm.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Contributor

bkamins commented Jan 21, 2022

There is some strange test failure on Julia 1.0 (I would assume that the failing == test should pass). @andreasnoack Do you know the reason?

@nalimilan
Copy link
Member

That failure is indeed weird. I wonder whether it happens at random or whether it's something specific to Julia 1.0 on Mac.

src/glmfit.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2022

Codecov Report

Merging #446 (fdd62fc) into master (be96833) will increase coverage by 0.11%.
The diff coverage is 95.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #446      +/-   ##
==========================================
+ Coverage   84.01%   84.12%   +0.11%     
==========================================
  Files           7        7              
  Lines         807      819      +12     
==========================================
+ Hits          678      689      +11     
- Misses        129      130       +1     
Impacted Files Coverage Δ
src/lm.jl 96.06% <83.33%> (-0.69%) ⬇️
src/glmfit.jl 78.59% <100.00%> (+0.22%) ⬆️
src/linpred.jl 77.31% <100.00%> (+0.99%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be96833...fdd62fc. Read the comment docs.

@andreasnoack
Copy link
Member Author

andreasnoack commented Jan 22, 2022

I can reproduce this locally with Julia 1.0

julia> coef(glm(X, y, Normal(), IdentityLink()))
2-element Array{Float64,1}:
 0.9999999999999999
 1.0

julia> coef(glm(view(X, 1:9, :), view(y, 1:9), Normal(), IdentityLink()))
2-element Array{Float64,1}:
 1.0
 1.0

I don't know why the two versions are taking a different code path somewhere. If somebody has any idea then I'd like to hear it.

@andreasnoack
Copy link
Member Author

This also fails on Julia 1.7 so I'll try to figure out where the divergence is happening.

@bkamins
Copy link
Contributor

bkamins commented Jan 23, 2022

There is the same problem for lm as for glm (tested under Julia 1.0):

julia> coef(lm(X, y))
2-element Array{Float64,1}:
 1.0
 1.0000000000000004

julia> coef(lm(view(X, 1:9, :), view(y, 1:9)))
2-element Array{Float64,1}:
 0.9999999999999999
 1.0

julia> coef(glm(X, y, Normal(), IdentityLink()))
2-element Array{Float64,1}:
 0.9999999999999999
 0.9999999999999999

julia> coef(glm(view(X, 1:9, :), view(y, 1:9), Normal(), IdentityLink()))
2-element Array{Float64,1}:
 0.9999999999999999
 1.0

@andreasnoack
Copy link
Member Author

So this turned out to be pretty dumb. The view was 1:9 but he length was 10 so the two problems weren't identical. I just hadn't added any noise to the dependent variable so it was a perfect fit except for rounding differences. I've now added some noise and made the views the same lengths as the original data.

Copy link
Contributor

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also bump package version number in this PR and make a release after this?

Thank you!

@nalimilan
Copy link
Member

Good! Though it looks like my suggestions got reverted.

@andreasnoack
Copy link
Member Author

Though it looks like my suggestions got reverted.

Hm. Thanks for catching that. I tried to squash the intermediate commits while rebasing on master and I must somehow have ended up reverting those changes while doing that. I've reintroduced your suggestions.

@andreasnoack
Copy link
Member Author

I've now also included the tests from #466 and bumped the version.

src/linpred.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@andreasnoack andreasnoack merged commit fa74d14 into master Jan 24, 2022
@nalimilan nalimilan deleted the an/views branch January 24, 2022 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants