Fix the ensemble_size == 0 error in automl.py #1369

bkpcoding · 2022-01-13T10:53:06Z

Fixing the ensemble_size == 0 error in the fit_ensemble and show_models functions by raising a ValueError in the former and issuing a warning and returning an empty dictionary in the latter. Related to issue #1365

…by adding a valueError to the former and giving a warning and returning empty dictionary in the latter

bkpcoding · 2022-01-13T10:57:30Z

Hey @eddiebergman can you check the PR?

eddiebergman

Seems mostly good :) Just a bit more explicit checking with show_models().

We would also add some brief tests to make sure these are raised.
This constitutes 4 tests, one for each kind of error/warning. These would all go in test_automl.py

Test that fit_ensemble(..., ensemble_size=0) gives an error. You don't need to worry about fitting before hand as the check is performed before anything explicit is done.
Check for the fitted error in show_models(), should be straight forward enough.
Check for empty dict when show_models() is called with ensemble_size = 0 passed in construction of AutoMLClassifier.
The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.

For reference for tests if you're not familiar, you'll need pytest.raises(<ErrorType>, match=<msg>) for checking for errors being emitted.

eddiebergman · 2022-01-13T11:37:28Z

autosklearn/automl.py

@@ -1906,6 +1907,9 @@ def show_models(self) -> Dict[int, Any]:
        """

        ensemble_dict = {}
+        if self._ensemble_size == 0:
+            self._logger.warning('No models in the ensemble. Kindly check the ensemble size.')


As this is a front facing API, this warning should be emitted to the user and not just logged silently, in fact there's not much point in logging it silently as it wouldn't help with debugging much.

import warnings warnings.warn(msg)

Also, I don't think _ensemble_size is the best proxy for checking this. A use could pass _ensemble_size = 10 but auto-sklearn failed to find any models, resulting in self._ensemble_size = 10 but having no models_ property.

Take a look at _load_models() for the logic of self.ensemble_ and self.models_.

While thinking about it, there's probably three separate conditions to check for:

Has autosklearn been fitted? If not raise a RuntimeError

Has autosklearn been given a paramter of ensemble_size=0? If so, issue a warning and give an empty dict.

Has autosklearn been given a parameter ensemble_size > 0 but there is no ensemble to load from? If so, issue a warning and give an empty dict.

Has
I think checking self.ensemble_ for None should be good enough

I would change it to

# At top import warnings if self._ensemble_size == 0

autosklearn/automl.py

bkpcoding · 2022-01-13T14:35:01Z

Thanks for the comments, will try to write tests and implement the suggestions.

codecov · 2022-01-13T15:33:50Z

Codecov Report

Merging #1369 (ba75004) into development (8cf3d5a) will increase coverage by 0.07%.
The diff coverage is 80.00%.

@@               Coverage Diff               @@
##           development    #1369      +/-   ##
===============================================
+ Coverage        84.41%   84.48%   +0.07%     
===============================================
  Files              146      146              
  Lines            11267    11282      +15     
  Branches          1925     1929       +4     
===============================================
+ Hits              9511     9532      +21     
+ Misses            1240     1234       -6     
  Partials           516      516

eddiebergman · 2022-02-02T15:30:19Z

Hi @bkpcoding,

Do you plan to continue this? If not it's okay but I would rather close this if it's going to go stale.

Best,
Eddie

bkpcoding · 2022-02-02T15:35:53Z

Hey @eddiebergman,
I am extremely sorry, I have my exams going on but I did implement some of the changes you mentioned and will try to push them ASAP.

eddiebergman · 2022-02-02T15:37:51Z

No rush!! Please focus on your exams, this can wait. Just wanted to make sure there was still some plan to do this :)

Added two tests to check if the automl.fit_ensemble() raises error when ensemble_size == 0 and if show_models() returns empty dictionary when ensemble_size == 0

bkpcoding · 2022-02-12T19:23:28Z

Hey, @eddiebergman I have made the changes you had requested and added two tests for them, however, it was not clear to me how to implement the points you suggested below, I tried a few things but was not able to successfully run them.

Has autosklearn been fitted? If not raise a RuntimeError
The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.
If you can help me with these. But if the changes are minor and you feel that the changes of the pr are not very important for now and might take your significant time, you can close the pr, I shall not mind.

eddiebergman · 2022-02-14T10:32:15Z

Hi @bkpcoding,hope your exams are going well :)

So I think any kind of checks help so I'll merge it in once possible, no need to discard the work :) I'll handle the merge once I can and this is finished!

So really the class should definitely have some kind of explicit flag indicating it is fit. We will handle this eventually to try and get full sklearn-compatibility. ~~The best possible flag is self.models_.~~

~~if self.models_ is None ...~~

Turns out it's not very straightforward to see if autosklearn is fitted. Most places just use _load_models but that seems it will just return [], if fitted and no models found or still [] if fit hasn't been called.

Can you do the following, it should make the check a lot easier, and allow you to easily implement Has autosklearn been fitted? If not raise a RuntimeError:

Set an attribute self.fitted = False in init and self.fitted = True in the fit.

Add a method

def __sklearn_is_fitted__(self):
    return self.fitted

This makes us compatible with this sklearn check

The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.
If you can help me with these. But if the changes are minor and you feel that the changes of the pr are not very important for now and might take your significant time, you can close the pr, I shall not mind.

Sorry I'm not sure why this is needed, what's the issue that is occurring with this test? If it does not return an empty dict, then we need to fix show_models so that this does happens. If an error is being raised already, then this behaviour seems correct to me and you can test for this using pytest.raises:

def test_x():
    with pytest.raises(ErrorType, match="First few words of the error"):
        do_thing_that_raises_error()

Test for checking if the show_models() functions raise an error if models are not fitted.

Add a function __sklearn_is_fitted__() which returns the boolean value of self.fitted(). And add the check for model fitting in show_models() function.

autosklearn/automl.py

eddiebergman · 2022-02-16T09:15:41Z

Seems good to me, I'll resolve the conflicts and run the workflow files :) If there's any issues, you have to git pull and fix them but otherwise I'm happy with it!

@mfeurer, take a look if you like.

eddiebergman · 2022-02-16T09:23:35Z

I've done the merge, future contributions with formatting and checking should be easier with some new tooling :) If you want to run the pre-commit checks locally (the ones that normally fail -_-), you can do:

pip install pre-commit
pre-commit run --all-files

I'll check back in later today and see how the tests do.

Edit:
Seems the pre-commit workflow failed pretty quickly, this is because we now use black and isort which was put in after your PR was started. These are becoming pretty common in the Python eco-system so it's good to get familiar :) To format the files:

pip install black isort
black autosklearn  # Runs black on all the files in `autosklearn` folder
isort autosklearn  # Same for isort

black tests  # I'm sure you can figure these out
isort tests

Again this was made easier, in the latest version of development you can just run make format.

Fixed through online editor

bkpcoding · 2022-02-16T14:39:36Z

Hey @eddiebergman, sorry I didn't run the pre-commit tests before. But I have run it now(after formatting using black and isort) and all of them have passed. Let me know if I need to make any more changes or run any tests(I did run test_automl.py and it passed all of them).

eddiebergman · 2022-02-16T15:57:43Z

Sorry, rerunning tests, seems like one of them had pip installation issue but it seems unrelated to this (timeouts when trying to find a requirement). I reran them just to be sure

eddiebergman · 2022-02-16T19:46:09Z

Hey @bkpcoding,

Thanks very much for contributing! I've merged it so congrats :)

If you have any comments on how the contribution process was for you, we'd be happy to hear any thoughts you might have to help us improve for the future!

How was interacting with our codebase/tests/GitHub actions?
Did you use the Contribution guide and is there anything you'd change with it?
How was the response/feedback during the contribution?
Any thoughts on how we could improve for future contributions?
Would you be interested in contributing again in any capacity?

You can leave it here as a comment or else you can find my email on my GitHub profile if you'd prefer to do so privately!

Thanks again!

Best,
Eddie

…1369)

bkpcoding · 2022-02-18T19:18:47Z

Hey @eddiebergman thanks for merging the PR and for all the trouble you took for review and suggestions. I think the entire codebase was very well written and after reading the paper and going through the documentation I was pretty much able to grasp the implementation of the library.
The contribution guide was extremely useful though, I was able to set up the development environment without any hassle.
And any problems that I faced were due to my lack of complete understanding of git and git commands, however, a quick google was able to sort them out.
And I would love to contribute to the library as much as I can. And sorry for the late replies as I had exams, but as they have ended I will be regularly checking the issues.

eddiebergman · 2022-02-19T19:21:52Z

Great to hear :) The code base can always be improved but step by step!

Git is always one of those things that takes time, I still don't know much of it's functionality but always plenty more to learn. If there's anything you found helpful to know about git, feel free to throw it into the Contribution guide.

If you like, I can tag you on issues that I don't think require going too deep or require extensive changes, otherwise feel free to poke around whenever you feel like it!

Best of luck with the exams!

Best,
Eddie

bkpcoding · 2022-02-20T11:14:49Z

Surely you can tag me on the issues you think would be relevant to me, I would be happy to look at them

* Fix the ensemble == 0 error in fit_ensemble and show_models function by adding a valueError to the former and giving a warning and returning empty dictionary in the latter * Update automl.py * Two tests for ensemble_size == 0 cases Added two tests to check if the automl.fit_ensemble() raises error when ensemble_size == 0 and if show_models() returns empty dictionary when ensemble_size == 0 * Update automl.py * Update test_automl.py Test for checking if the show_models() functions raise an error if models are not fitted. * Update automl.py Add a function __sklearn_is_fitted__() which returns the boolean value of self.fitted(). And add the check for model fitting in show_models() function. * Update autosklearn/automl.py * Formatting changes to clear all the pre-commit tests Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com>

Fix the ensemble == 0 error in fit_ensemble and show_models function …

6136a9f

…by adding a valueError to the former and giving a warning and returning empty dictionary in the latter

bkpcoding changed the base branch from master to development January 13, 2022 10:53

bkpcoding changed the title ~~My new branch~~ Fix the ensemble_size == 0 error in automl.py Jan 13, 2022

eddiebergman previously requested changes Jan 13, 2022

View reviewed changes

This was linked to issues Feb 2, 2022

I am getting error in fit_ensemble() in 0.14.2 version #1327

Closed

AutoML::fit_ensemble with ensemble_size =0 causes crash #1365

Closed

bkpcoding added 2 commits February 13, 2022 00:13

Update automl.py

3b2c3d0

Two tests for ensemble_size == 0 cases

6fd3f61

Added two tests to check if the automl.fit_ensemble() raises error when ensemble_size == 0 and if show_models() returns empty dictionary when ensemble_size == 0

bkpcoding added 3 commits February 16, 2022 00:42

Update automl.py

3cae5e2

Update test_automl.py

4a2bdbf

Test for checking if the show_models() functions raise an error if models are not fitted.

Update automl.py

083eb3a

Add a function __sklearn_is_fitted__() which returns the boolean value of self.fitted(). And add the check for model fitting in show_models() function.

eddiebergman reviewed Feb 16, 2022

View reviewed changes

autosklearn/automl.py Outdated Show resolved Hide resolved

Merge branch 'development' into my_new_branch

82b36dd

Update autosklearn/automl.py

2407cc0

Formatting changes to clear all the pre-commit tests

ba75004

eddiebergman merged commit 45d3ff8 into automl:development Feb 16, 2022

github-actions bot pushed a commit that referenced this pull request Feb 16, 2022

Basavasagar K Patil: Fix the ensemble_size == 0 error in automl.py (#…

fa1573b

…1369)

bkpcoding deleted the my_new_branch branch February 18, 2022 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the ensemble_size == 0 error in automl.py #1369

Fix the ensemble_size == 0 error in automl.py #1369

bkpcoding commented Jan 13, 2022 •

edited

Loading

bkpcoding commented Jan 13, 2022

eddiebergman left a comment

eddiebergman Jan 13, 2022

bkpcoding commented Jan 13, 2022

codecov bot commented Jan 13, 2022 •

edited

Loading

eddiebergman commented Feb 2, 2022

bkpcoding commented Feb 2, 2022

eddiebergman commented Feb 2, 2022

bkpcoding commented Feb 12, 2022

eddiebergman commented Feb 14, 2022 •

edited

Loading

eddiebergman commented Feb 16, 2022

eddiebergman commented Feb 16, 2022 •

edited

Loading

bkpcoding commented Feb 16, 2022

eddiebergman commented Feb 16, 2022

eddiebergman commented Feb 16, 2022

bkpcoding commented Feb 18, 2022

eddiebergman commented Feb 19, 2022

bkpcoding commented Feb 20, 2022

Fix the ensemble_size == 0 error in automl.py #1369

Fix the ensemble_size == 0 error in automl.py #1369

Conversation

bkpcoding commented Jan 13, 2022 • edited Loading

bkpcoding commented Jan 13, 2022

eddiebergman left a comment

Choose a reason for hiding this comment

eddiebergman Jan 13, 2022

Choose a reason for hiding this comment

bkpcoding commented Jan 13, 2022

codecov bot commented Jan 13, 2022 • edited Loading

Codecov Report

eddiebergman commented Feb 2, 2022

bkpcoding commented Feb 2, 2022

eddiebergman commented Feb 2, 2022

bkpcoding commented Feb 12, 2022

eddiebergman commented Feb 14, 2022 • edited Loading

eddiebergman commented Feb 16, 2022

eddiebergman commented Feb 16, 2022 • edited Loading

bkpcoding commented Feb 16, 2022

eddiebergman commented Feb 16, 2022

eddiebergman commented Feb 16, 2022

bkpcoding commented Feb 18, 2022

eddiebergman commented Feb 19, 2022

bkpcoding commented Feb 20, 2022

bkpcoding commented Jan 13, 2022 •

edited

Loading

codecov bot commented Jan 13, 2022 •

edited

Loading

eddiebergman commented Feb 14, 2022 •

edited

Loading

eddiebergman commented Feb 16, 2022 •

edited

Loading