Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the ensemble_size == 0 error in automl.py #1369

Merged
merged 9 commits into from
Feb 16, 2022

Conversation

bkpcoding
Copy link
Contributor

@bkpcoding bkpcoding commented Jan 13, 2022

Fixing the ensemble_size == 0 error in the fit_ensemble and show_models functions by raising a ValueError in the former and issuing a warning and returning an empty dictionary in the latter. Related to issue #1365

…by adding a valueError to the former and giving a warning and returning empty dictionary in the latter
@bkpcoding bkpcoding changed the base branch from master to development January 13, 2022 10:53
@bkpcoding bkpcoding changed the title My new branch Fix the ensemble_size == 0 error in automl.py Jan 13, 2022
@bkpcoding
Copy link
Contributor Author

Hey @eddiebergman can you check the PR?

Copy link
Contributor

@eddiebergman eddiebergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems mostly good :) Just a bit more explicit checking with show_models().

We would also add some brief tests to make sure these are raised.
This constitutes 4 tests, one for each kind of error/warning. These would all go in test_automl.py

  • Test that fit_ensemble(..., ensemble_size=0) gives an error. You don't need to worry about fitting before hand as the check is performed before anything explicit is done.
  • Check for the fitted error in show_models(), should be straight forward enough.
  • Check for empty dict when show_models() is called with ensemble_size = 0 passed in construction of AutoMLClassifier.
  • The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.

For reference for tests if you're not familiar, you'll need pytest.raises(<ErrorType>, match=<msg>) for checking for errors being emitted.

@@ -1906,6 +1907,9 @@ def show_models(self) -> Dict[int, Any]:
"""

ensemble_dict = {}
if self._ensemble_size == 0:
self._logger.warning('No models in the ensemble. Kindly check the ensemble size.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a front facing API, this warning should be emitted to the user and not just logged silently, in fact there's not much point in logging it silently as it wouldn't help with debugging much.

import warnings
warnings.warn(msg)

Also, I don't think _ensemble_size is the best proxy for checking this. A use could pass _ensemble_size = 10 but auto-sklearn failed to find any models, resulting in self._ensemble_size = 10 but having no models_ property.

Take a look at _load_models() for the logic of self.ensemble_ and self.models_.

While thinking about it, there's probably three separate conditions to check for:

  • Has autosklearn been fitted? If not raise a RuntimeError
  • Has autosklearn been given a paramter of ensemble_size=0? If so, issue a warning and give an empty dict.
  • Has autosklearn been given a parameter ensemble_size > 0 but there is no ensemble to load from? If so, issue a warning and give an empty dict.
  • Has
    I think checking self.ensemble_ for None should be good enough

I would change it to

# At top
import warnings

if self._ensemble_size == 0

autosklearn/automl.py Outdated Show resolved Hide resolved
@bkpcoding
Copy link
Contributor Author

Thanks for the comments, will try to write tests and implement the suggestions.

@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #1369 (ba75004) into development (8cf3d5a) will increase coverage by 0.07%.
The diff coverage is 80.00%.

@@               Coverage Diff               @@
##           development    #1369      +/-   ##
===============================================
+ Coverage        84.41%   84.48%   +0.07%     
===============================================
  Files              146      146              
  Lines            11267    11282      +15     
  Branches          1925     1929       +4     
===============================================
+ Hits              9511     9532      +21     
+ Misses            1240     1234       -6     
  Partials           516      516              

Impacted file tree graph

@eddiebergman
Copy link
Contributor

Hi @bkpcoding,

Do you plan to continue this? If not it's okay but I would rather close this if it's going to go stale.

Best,
Eddie

@bkpcoding
Copy link
Contributor Author

Hey @eddiebergman,
I am extremely sorry, I have my exams going on but I did implement some of the changes you mentioned and will try to push them ASAP.

@eddiebergman
Copy link
Contributor

No rush!! Please focus on your exams, this can wait. Just wanted to make sure there was still some plan to do this :)

Added two tests to check if the automl.fit_ensemble() raises error when ensemble_size == 0 and if show_models() returns empty dictionary when ensemble_size == 0
@bkpcoding
Copy link
Contributor Author

Hey, @eddiebergman I have made the changes you had requested and added two tests for them, however, it was not clear to me how to implement the points you suggested below, I tried a few things but was not able to successfully run them.

  • Has autosklearn been fitted? If not raise a RuntimeError
  • The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.
    If you can help me with these. But if the changes are minor and you feel that the changes of the pr are not very important for now and might take your significant time, you can close the pr, I shall not mind.

@eddiebergman
Copy link
Contributor

eddiebergman commented Feb 14, 2022

Hi @bkpcoding,hope your exams are going well :)

So I think any kind of checks help so I'll merge it in once possible, no need to discard the work :) I'll handle the merge once I can and this is finished!

So really the class should definitely have some kind of explicit flag indicating it is fit. We will handle this eventually to try and get full sklearn-compatibility. The best possible flag is self.models_.

  • if self.models_ is None ...

Turns out it's not very straightforward to see if autosklearn is fitted. Most places just use _load_models but that seems it will just return [], if fitted and no models found or still [] if fit hasn't been called.

Can you do the following, it should make the check a lot easier, and allow you to easily implement Has autosklearn been fitted? If not raise a RuntimeError:

  • Set an attribute self.fitted = False in init and self.fitted = True in the fit.
  • Add a method
    def __sklearn_is_fitted__(self):
        return self.fitted
    This makes us compatible with this sklearn check

The last test, checking for no models existing return an empty dict, We could construct an AutoMLClassifier with ensemble_size > 0, set whatever flags are needed to indicate it has been fitted so it passed the other check and then test for the correct kind of error being raise.
If you can help me with these. But if the changes are minor and you feel that the changes of the pr are not very important for now and might take your significant time, you can close the pr, I shall not mind.

Sorry I'm not sure why this is needed, what's the issue that is occurring with this test? If it does not return an empty dict, then we need to fix show_models so that this does happens. If an error is being raised already, then this behaviour seems correct to me and you can test for this using pytest.raises:

def test_x():
    with pytest.raises(ErrorType, match="First few words of the error"):
        do_thing_that_raises_error()

Test for checking if the show_models() functions raise an error if models are not fitted.
Add a function __sklearn_is_fitted__() which returns the boolean value of self.fitted(). And add the check for model fitting in show_models() function.
autosklearn/automl.py Outdated Show resolved Hide resolved
@eddiebergman
Copy link
Contributor

Seems good to me, I'll resolve the conflicts and run the workflow files :) If there's any issues, you have to git pull and fix them but otherwise I'm happy with it!

@mfeurer, take a look if you like.

@eddiebergman
Copy link
Contributor

eddiebergman commented Feb 16, 2022

I've done the merge, future contributions with formatting and checking should be easier with some new tooling :) If you want to run the pre-commit checks locally (the ones that normally fail -_-), you can do:

pip install pre-commit
pre-commit run --all-files

I'll check back in later today and see how the tests do.

Edit:
Seems the pre-commit workflow failed pretty quickly, this is because we now use black and isort which was put in after your PR was started. These are becoming pretty common in the Python eco-system so it's good to get familiar :) To format the files:

pip install black isort
black autosklearn  # Runs black on all the files in `autosklearn` folder
isort autosklearn  # Same for isort

black tests  # I'm sure you can figure these out
isort tests

Again this was made easier, in the latest version of development you can just run make format.

@eddiebergman eddiebergman dismissed their stale review February 16, 2022 09:29

Fixed through online editor

@bkpcoding
Copy link
Contributor Author

Hey @eddiebergman, sorry I didn't run the pre-commit tests before. But I have run it now(after formatting using black and isort) and all of them have passed. Let me know if I need to make any more changes or run any tests(I did run test_automl.py and it passed all of them).

@eddiebergman
Copy link
Contributor

Sorry, rerunning tests, seems like one of them had pip installation issue but it seems unrelated to this (timeouts when trying to find a requirement). I reran them just to be sure

@eddiebergman eddiebergman merged commit 45d3ff8 into automl:development Feb 16, 2022
@eddiebergman
Copy link
Contributor

Hey @bkpcoding,

Thanks very much for contributing! I've merged it so congrats :)

If you have any comments on how the contribution process was for you, we'd be happy to hear any thoughts you might have to help us improve for the future!

  • How was interacting with our codebase/tests/GitHub actions?
  • Did you use the Contribution guide and is there anything you'd change with it?
  • How was the response/feedback during the contribution?
  • Any thoughts on how we could improve for future contributions?
  • Would you be interested in contributing again in any capacity?

You can leave it here as a comment or else you can find my email on my GitHub profile if you'd prefer to do so privately!

Thanks again!

Best,
Eddie

@bkpcoding
Copy link
Contributor Author

Hey @eddiebergman thanks for merging the PR and for all the trouble you took for review and suggestions. I think the entire codebase was very well written and after reading the paper and going through the documentation I was pretty much able to grasp the implementation of the library.
The contribution guide was extremely useful though, I was able to set up the development environment without any hassle.
And any problems that I faced were due to my lack of complete understanding of git and git commands, however, a quick google was able to sort them out.
And I would love to contribute to the library as much as I can. And sorry for the late replies as I had exams, but as they have ended I will be regularly checking the issues.

@bkpcoding bkpcoding deleted the my_new_branch branch February 18, 2022 19:19
@eddiebergman
Copy link
Contributor

Great to hear :) The code base can always be improved but step by step!

Git is always one of those things that takes time, I still don't know much of it's functionality but always plenty more to learn. If there's anything you found helpful to know about git, feel free to throw it into the Contribution guide.

If you like, I can tag you on issues that I don't think require going too deep or require extensive changes, otherwise feel free to poke around whenever you feel like it!

Best of luck with the exams!

Best,
Eddie

@bkpcoding
Copy link
Contributor Author

Surely you can tag me on the issues you think would be relevant to me, I would be happy to look at them

eddiebergman added a commit that referenced this pull request Aug 18, 2022
* Fix the ensemble == 0 error in fit_ensemble and show_models function by adding a valueError to the former and giving a warning and returning empty dictionary in the latter

* Update automl.py

* Two tests for ensemble_size == 0 cases

Added two tests to check if the automl.fit_ensemble() raises error when ensemble_size == 0 and if show_models() returns empty dictionary when ensemble_size == 0

* Update automl.py

* Update test_automl.py

Test for checking if the show_models() functions raise an error if models are not fitted.

* Update automl.py

Add a function __sklearn_is_fitted__() which returns the boolean value of self.fitted(). And add the check for model fitting in show_models() function.

* Update autosklearn/automl.py

* Formatting changes to clear all the pre-commit tests

Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants