-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolating issues in handling rank-deficiency #324
Conversation
Codecov Report
@@ Coverage Diff @@
## master #324 +/- ##
==========================================
- Coverage 95.44% 94.86% -0.59%
==========================================
Files 23 23
Lines 1515 1576 +61
==========================================
+ Hits 1446 1495 +49
- Misses 69 81 +12
Continue to review full report at Codecov.
|
Just had Cholesky pass on the nightly CI with this versioninfo:
|
And now from a failed run:
|
So it really does look like the interaction of Intel Skylake and development version of Julia that's causing this problem. @andreasnoack Has anything changed in |
Skylake processors do pass on Julia 1.4 / the older LLVM:
|
So it's not just the newer LLVM version on Skylake. I've found a non Xeon Skylake to run the tests on and they don't fail (OpenSuse 15.0):
But even on the same nightly, they do fail on Skylake Xeon (Ubuntu 18.04):
Meanwhile the broadwell CI machines continue to pass. |
It might be the version of OpenBLAS. We upgraded recently and it could probably affect AVX512 systems. Try comparing julia> BLAS.openblas_get_config()
"OpenBLAS 0.3.5 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=32" There have been issues with AVX512 kernels in OpenBLAS in the past but it might also be that you need to use a different tolerance when determining the rank. |
Skylake machine where the tests passed:
Skylake where the tests failed:
@andreasnoack It does look the OpenBLAS version is playing a role here. |
You can try starting julia with |
There's a middle ground here that I can implement: we just wrap the BLAS pivoted Cholesky so that the order of the linearly independent columns isn't changed and the linearly dependent columns are put all the way to the right. In other words, we let BLAS decide which columns are redundant, but otherwise preserve order. |
I wrote that last comment and did some tests .... and it seems that the original test that started all of this out is so ill-conditioned that even the BLAS delivers different answers on different architectures as to which columns should be pivoted. But then again, so does the QR decomposition. In other words, swapping to QR won't save us here. So then: we either have a really ill conditioned example (inclusive) or there is a latent issue with the use of AVX instructions in OpenBLAS. |
Guess we should well define what the best behavior should be. Do we want to keep the full rank version based on the column ordering, do we want to make sure to always prefer the intercept, do we want to keep the combinations for a categorical variables over just a part set? Sometimes in applied work on wants to make sure to include the feature of interest, yes, but other than that and the intercept I am not sure. I really dislike Stata's random algo to choose which ones to drop so for it to be deterministic to some extent seems desirable. I would maybe use some measure of variability in the column as a way to "prioritize" the selected columns. |
So a couple of thoughts a had after rereading this issue:
|
@dmbates If you care to review, we can be done with this mess! The pivoted QR is now what we're using; I've left the pivoted Cholesky in for now and made it clear in the docs that we make no promise that we'll keep using the same algorithm for rank determination and pivoting. Do check out the docs -- they're not my best explanation ever, but if they're good enough and I haven't been infelicitous in my simplifications, we can merge this change in and improve the docs later. Then the CIs will all be happy again. |
I have made some suggested edits in One thing I did try to distinguish more clearly is the distinction between singularity of the covariance matrix estimates for the random effects, which will result in the conditional means being on some kind of hyperplane in one or more dimensions, and singularity or rank deficiency in the random-effects model matrix, which essentially always occurs. The random-effects model matrix for any model with a random intercept contains a set of indicators for the levels of the grouping factor. Even without considering |
Thanks @dmbates! I think it reads better now and sharpening the distinction between the two types of singularity in the RE is a nice addition. Also, I apparently cannot type |
If you're happy, then let's squash and merge (and please remove all the redundant stuff from the long-form commit message). |
Can you rebase? I tried to do so and it didn't go well.
…On Tue, Sep 15, 2020, 15:14 Phillip Alday ***@***.***> wrote:
If you're happy, then let's squash and merge (and please remove all the
redundant stuff from the long-form commit message).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#324 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC2UOWFNBDDZIU3SU3LW5TSF7DJJANCNFSM4NIOI3EQ>
.
|
I can, but there's no need to do so: the squash will collapse everything into a single commit and you can edit the commit message through the web interface. Should I just do that? |
Yes, pl
…On Tue, Sep 15, 2020, 15:26 Phillip Alday ***@***.***> wrote:
I can, but there's no need to do so: the squash will collapse everything
into a single commit and you can edit the commit message through the web
interface. Should I just do that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#324 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC2UOU2IH5SOEH4BLBNQI3SF7EZDANCNFSM4NIOI3EQ>
.
|
The Cholesky decomposition can fail and thus far it fails silently, which leads to the sporadic errors we get in testing against Julia nightly.
Possible solutions:
This initial pull is just proof-of-concept for where the Cholesky fails on the CIs. No solution is implemented yet.