Isolating issues in handling rank-deficiency #324

palday · 2020-05-23T13:24:44Z

The Cholesky decomposition can fail and thus far it fails silently, which leads to the sporadic errors we get in testing against Julia nightly.

Possible solutions:

Use SVD or QR decompositions to compute pivot
Use SVD for more robust rank detection
????

This initial pull is just proof-of-concept for where the Cholesky fails on the CIs. No solution is implemented yet.

codecov · 2020-05-23T15:02:24Z

Codecov Report

Merging #324 into master will decrease coverage by 0.58%.
The diff coverage is 97.56%.

@@            Coverage Diff             @@
##           master     #324      +/-   ##
==========================================
- Coverage   95.44%   94.86%   -0.59%     
==========================================
  Files          23       23              
  Lines        1515     1576      +61     
==========================================
+ Hits         1446     1495      +49     
- Misses         69       81      +12

Impacted Files	Coverage Δ
src/MixedModels.jl	`100.00% <ø> (ø)`
src/linalg/pivot.jl	`97.05% <97.05%> (ø)`
src/femat.jl	`100.00% <100.00%> (ø)`
src/schema.jl	`75.00% <0.00%> (-25.00%)`	⬇️
src/mixedmodel.jl	`80.95% <0.00%> (-19.05%)`	⬇️
src/linalg/rankUpdate.jl	`95.23% <0.00%> (-2.33%)`	⬇️
src/arraytypes.jl	`91.30% <0.00%> (-2.03%)`	⬇️
src/generalizedlinearmixedmodel.jl	`82.32% <0.00%> (-1.09%)`	⬇️
src/linearmixedmodel.jl	`98.91% <0.00%> (-0.27%)`	⬇️
src/remat.jl	`95.25% <0.00%> (-0.13%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7c73b0...f1b8bdb. Read the comment docs.

palday · 2020-05-24T13:26:58Z

Just had Cholesky pass on the nightly CI with this versioninfo:

 Julia Version 1.6.0-DEV.85
Commit 0413ef0e4d (2020-05-24 02:53 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)

palday · 2020-05-24T13:55:21Z

And now from a failed run:

Julia Version 1.6.0-DEV.85
Commit 0413ef0e4d (2020-05-24 02:53 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

palday · 2020-05-24T14:01:59Z

So it really does look like the interaction of Intel Skylake and development version of Julia that's causing this problem.

@andreasnoack Has anything changed in LinearAlgebra since 1.4 that would make the linear algebra more sensitive to processor details?

palday · 2020-05-24T14:20:20Z

Skylake processors do pass on Julia 1.4 / the older LLVM:

Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

palday · 2020-05-25T12:34:19Z

So it's not just the newer LLVM version on Skylake. I've found a non Xeon Skylake to run the tests on and they don't fail (OpenSuse 15.0):

Julia Version 1.6.0-DEV.90                                                                                            
Commit c832e47a0a (2020-05-25 08:56 UTC)      
Platform Info:                                                                                                        
  OS: Linux (x86_64-pc-linux-gnu)                                                                                     
  CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz                                                                       
  WORD_SIZE: 64                                                                                                       
  LIBM: libopenlibm                                                                                                   
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)                                                                               
Environment:                                                                                                          
  JULIA_LOAD_PATH = @:/tmp/jl_Py0VU3

But even on the same nightly, they do fail on Skylake Xeon (Ubuntu 18.04):

Julia Version 1.6.0-DEV.90
Commit c832e47a0a (2020-05-25 08:56 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_LOAD_PATH = @:/tmp/jl_6QaBGU

Meanwhile the broadwell CI machines continue to pass.

andreasnoack · 2020-05-25T12:48:04Z

@andreasnoack Has anything changed in LinearAlgebra since 1.4 that would make the linear algebra more sensitive to processor details?

It might be the version of OpenBLAS. We upgraded recently and it could probably affect AVX512 systems. Try comparing

julia> BLAS.openblas_get_config()
"OpenBLAS 0.3.5  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=32"

There have been issues with AVX512 kernels in OpenBLAS in the past but it might also be that you need to use a different tolerance when determining the rank.

palday · 2020-05-25T12:59:12Z

Skylake machine where the tests passed:

"OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=32"

Skylake where the tests failed:

OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=32

@andreasnoack It does look the OpenBLAS version is playing a role here.

andreasnoack · 2020-05-25T13:05:49Z

You can try starting julia with OPENBLAS_CORETYPE=haswell julia on the SkylakeX machine to verify that restricting OpenBLAS to the Haswell kernels actually fixes the issue. But again, it might just be because of the differences in rounding if the tolerance is too strict.

palday · 2020-08-08T08:58:20Z

There's a middle ground here that I can implement: we just wrap the BLAS pivoted Cholesky so that the order of the linearly independent columns isn't changed and the linearly dependent columns are put all the way to the right. In other words, we let BLAS decide which columns are redundant, but otherwise preserve order.

palday · 2020-08-08T09:03:51Z

I wrote that last comment and did some tests .... and it seems that the original test that started all of this out is so ill-conditioned that even the BLAS delivers different answers on different architectures as to which columns should be pivoted. But then again, so does the QR decomposition. In other words, swapping to QR won't save us here.

So then: we either have a really ill conditioned example (inclusive) or there is a latent issue with the use of AVX instructions in OpenBLAS.

Nosferican · 2020-08-08T09:12:55Z

Guess we should well define what the best behavior should be. Do we want to keep the full rank version based on the column ordering, do we want to make sure to always prefer the intercept, do we want to keep the combinations for a categorical variables over just a part set? Sometimes in applied work on wants to make sure to include the feature of interest, yes, but other than that and the intercept I am not sure. I really dislike Stata's random algo to choose which ones to drop so for it to be deterministic to some extent seems desirable. I would maybe use some measure of variability in the column as a way to "prioritize" the selected columns.

andreasnoack · 2020-08-10T09:58:25Z

So a couple of thoughts a had after rereading this issue:

Mixing rank determination strategies is not a good idea as they can easily disagree. Any solution should be based on a single procedure. (As I read the thread, this seems to be the consensus as well.)
The pivoted QR and pivoted Cholesky are equivalent in full precision so I believe it's expected that they give the same conclusion whenever you use the default tolerance but if you use a lower tolerance the QR should be able to be more precise than Cholesky. However, I don't think it would be useful with the extra precision in this application. (I might be wrong)
It's probably not possible to ensure that the rank determination is consistent across architectures in all cases. However, in should probably be the case in most real applications and it might, therefore, make sense to adjust the test if it's pathological.
Maybe you want to consider a higher tolerance than the default one. It should make is less likely that noise accidentally increases the rank with a large effect on coefficients. I guess it would also make the results less sensitive to architecture.
I'm wondering if it would be possible to get results that are easier to interpret when using the pivoted Cholesky if the variables are scaled differently. I think the intercept is lost because higher-order terms end up with a larger column norm than the intercept but since the intercept doesn't have to be one, it's, in some sense, a self-created problem. Make the intercept 1e6 and it will have the largest norm and probably not get dropped. Hence, it might be possible to use a scaling that ensures lower order terms have a large norm and then apply the inverse scaling after the rank determination.

palday · 2020-09-14T20:45:35Z

@dmbates If you care to review, we can be done with this mess! The pivoted QR is now what we're using; I've left the pivoted Cholesky in for now and made it clear in the docs that we make no promise that we'll keep using the same algorithm for rank determination and pivoting.

Do check out the docs -- they're not my best explanation ever, but if they're good enough and I haven't been infelicitous in my simplifications, we can merge this change in and improve the docs later. Then the CIs will all be happy again.

dmbates · 2020-09-15T18:18:43Z

I have made some suggested edits in rankdeficiency.md. These are suggestions only.

One thing I did try to distinguish more clearly is the distinction between singularity of the covariance matrix estimates for the random effects, which will result in the conditional means being on some kind of hyperplane in one or more dimensions, and singularity or rank deficiency in the random-effects model matrix, which essentially always occurs. The random-effects model matrix for any model with a random intercept contains a set of indicators for the levels of the grouping factor. Even without considering fulldummy these columns sum to an intercept column which makes the concatenation of X and Z rank deficient. So we always have some type of rank deficiency for the random-effects model matrix but it is not a problem because of the shrinkage or regularization.

palday · 2020-09-15T20:13:06Z

Thanks @dmbates! I think it reads better now and sharpening the distinction between the two types of singularity in the RE is a nice addition.

Also, I apparently cannot type redudant.

palday · 2020-09-15T20:13:57Z

If you're happy, then let's squash and merge (and please remove all the redundant stuff from the long-form commit message).

dmbates · 2020-09-15T20:16:16Z

Can you rebase? I tried to do so and it didn't go well.

…

On Tue, Sep 15, 2020, 15:14 Phillip Alday ***@***.***> wrote: If you're happy, then let's squash and merge (and please remove all the redundant stuff from the long-form commit message). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#324 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAC2UOWFNBDDZIU3SU3LW5TSF7DJJANCNFSM4NIOI3EQ> .

palday · 2020-09-15T20:26:41Z

I can, but there's no need to do so: the squash will collapse everything into a single commit and you can edit the commit message through the web interface. Should I just do that?

dmbates · 2020-09-15T20:28:21Z

Yes, pl

…

On Tue, Sep 15, 2020, 15:26 Phillip Alday ***@***.***> wrote: I can, but there's no need to do so: the squash will collapse everything into a single commit and you can edit the commit message through the web interface. Should I just do that? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#324 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAC2UOU2IH5SOEH4BLBNQI3SF7EZDANCNFSM4NIOI3EQ> .

palday added 6 commits May 23, 2020 15:10

debugging assertion for sporadic cholesky test failures

e818d37

assertion for failed Cholesky factorization

55d740f

actually use the tolerance specified in the function call

ade6f64

manual check for pivoted factorization (pending JuliaLang/julia#36002)

9b1ed31

use built-in rank method

3e358f3

dummy info for pivoted

fdf96f0

palday added 6 commits May 23, 2020 17:09

scale things

9f9addd

debug output for rank calc

e7d3601

remove Tier2 while debugging (ROLL BACK THIS COMMIT BEFORE MERGING)

aa701e7

dump version info

4380792

import versioninfo

5ca0cd5

add InteractiveUtils to deps

d8d5b3b

palday added 2 commits May 24, 2020 15:38

include versioninfo in tests

35f375c

nicer printing of rank

cc772f3

palday added 4 commits May 24, 2020 16:02

remove cruft

2999c53

only print the rank info when they don't match

a72f086

make z not a constant multiple of y

64677bd

don't forget zero

be3dfa5

get BLAS config

8407706

palday added 2 commits May 25, 2020 15:33

enviroment variable for openblas coretype

90fc6f0

try setting the BLAS enviroment variable differently

5757a65

statsqr

4daabb3

palday mentioned this pull request Aug 27, 2020

Fix grouping name and amalgamate for interaction grouping terms #361

Merged

palday mentioned this pull request Sep 9, 2020

Fix for one tiny aspect of the rank deficiency issues #367

Merged

palday added 12 commits September 14, 2020 10:20

Merge branch 'master' of github.com:JuliaStats/MixedModels.jl into pivot

3c1cba1

doc updates for rank deficiency

80c9cc7

load mixedmodels in rankdef docs

fc1ebd6

docstring attempt

cf0b0c5

more work on docstrings

64fba6e

you mock me, documenter

4c18189

will this be the commit that gets the docstring right?

72dbbc1

get documented, please/

88d6b4c

another attempt at docs

6743317

floating point tolerance for doctest

680a5c4

add fixefnames and coefnames to docs

f6afe52

rework text to avoid need for docstring; x-ref

99e552f

Suggested edits.

f1b8bdb

palday merged commit d601cdc into master Sep 15, 2020

ararslan deleted the pivot branch September 15, 2020 20:32

palday mentioned this pull request Apr 28, 2021

Incorrect linear regression results JuliaStats/GLM.jl#426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolating issues in handling rank-deficiency #324

Isolating issues in handling rank-deficiency #324

palday commented May 23, 2020

codecov bot commented May 23, 2020 •

edited

Loading

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 25, 2020

andreasnoack commented May 25, 2020

palday commented May 25, 2020

andreasnoack commented May 25, 2020

palday commented Aug 8, 2020

palday commented Aug 8, 2020

Nosferican commented Aug 8, 2020

andreasnoack commented Aug 10, 2020 •

edited

Loading

palday commented Sep 14, 2020

dmbates commented Sep 15, 2020

palday commented Sep 15, 2020

palday commented Sep 15, 2020

dmbates commented Sep 15, 2020 via email

palday commented Sep 15, 2020

dmbates commented Sep 15, 2020 via email

Isolating issues in handling rank-deficiency #324

Isolating issues in handling rank-deficiency #324

Conversation

palday commented May 23, 2020

codecov bot commented May 23, 2020 • edited Loading

Codecov Report

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 24, 2020

palday commented May 25, 2020

andreasnoack commented May 25, 2020

palday commented May 25, 2020

andreasnoack commented May 25, 2020

palday commented Aug 8, 2020

palday commented Aug 8, 2020

Nosferican commented Aug 8, 2020

andreasnoack commented Aug 10, 2020 • edited Loading

palday commented Sep 14, 2020

dmbates commented Sep 15, 2020

palday commented Sep 15, 2020

palday commented Sep 15, 2020

dmbates commented Sep 15, 2020 via email

palday commented Sep 15, 2020

dmbates commented Sep 15, 2020 via email

codecov bot commented May 23, 2020 •

edited

Loading

andreasnoack commented Aug 10, 2020 •

edited

Loading