Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collinearity Speed Up v2 #421

Merged
merged 19 commits into from
Jul 15, 2022
Merged

Collinearity Speed Up v2 #421

merged 19 commits into from
Jul 15, 2022

Conversation

ChadGueli
Copy link
Contributor

@ChadGueli ChadGueli commented Jun 21, 2022

Summary

Again, this is a major speed up. It eliminates recomputation along rows, saving a huge amount of time.

The VIFs are computed using the correlation formula, but optimized with only the diagonal inverted.

Tests

It passes all of the unittest tests. But the test_vif_calculation test is removed. I would be happy to add another that checks the current formulation.

cc:
@tommydangerous @skunichetty

@skunichetty
Copy link
Contributor

Sorry for the late response, is there any reason you switched back to VIFs from VDFs?

@ChadGueli
Copy link
Contributor Author

ChadGueli commented Jun 23, 2022

Yeah, I took a closer look at the tests, and realized that in cases of perfect collinearity, the VDPs are uniformly split between the columns. This creates a problem if you have more than 3 perfectly collinear columns. Initially I thought that nothing is ever perfectly collinear so it doesn't matter, but then I realized that the test situation is highly plausible especially if someone doesn't know what they're doing or doesn't understand their data.

If you use VDPs, I think its best to use them in conjunction with another indicator of collinearity. The VDP formula and VIF formula do have a lot in common, so it could be wise to use them in conjunction.

Copy link
Contributor

@skunichetty skunichetty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great! Once you resolve my comments I can approve for merging, or if you want to investigate a joint approach using VDPs you can add on as well.

@tommydangerous
Copy link
Member

@ChadGueli @skunichetty is this still good to go?

@ChadGueli
Copy link
Contributor Author

Yes, it should be.

@skunichetty
Copy link
Contributor

skunichetty commented Jul 14, 2022

It's been a while since I last looked at this, but it should be good - i'll give it the go ahead.

@skunichetty skunichetty merged commit 0b1f04b into mage-ai:master Jul 15, 2022
@ChadGueli ChadGueli deleted the mcollin_prob branch July 21, 2022 17:27
@ChadGueli ChadGueli restored the mcollin_prob branch July 25, 2022 19:44
@ChadGueli ChadGueli deleted the mcollin_prob branch July 25, 2022 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants