-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change HP Name & Include Text example #1410
Conversation
… in all *csv and *json files for metalearning
… in all *csv and *json files for metalearning
Codecov Report
@@ Coverage Diff @@
## development #1410 +/- ##
===============================================
- Coverage 84.52% 84.51% -0.01%
===============================================
Files 146 146
Lines 11283 11283
Branches 1929 1929
===============================================
- Hits 9537 9536 -1
- Misses 1231 1232 +1
Partials 515 515 |
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddiebergman any idea how to fix PEP8 here?
# ========================== | ||
|
||
automl = autosklearn.classification.AutoSklearnClassifier( | ||
# set the time high enough text preprocessing can create many new features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 20 newsgroup work in the setting on the left? That would be preferable for running this example in the github actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also use a smaller dataset? You can use the following script to scan on OpenML for datasets containing string data:
import openml
datasets = openml.datasets.list_datasets()
for did in datasets:
try:
dataset = openml.datasets.get_dataset(did, download_data=False, download_qualities=False)
for feat in dataset.features:
if dataset.features[feat].data_type == 'string':
print(did, dataset.name)
break
except Exception as e:
print(e)
continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example yields ~80% acc. on the test set. Selecting random would be 5% for 20 labels. Therefore i would say that the example works. But it also runs 300 sec. which are 5 min. So if that is to long i can search another dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant, would the example work when you restrict it to use only a single configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a parameter for setting autosklearn to it or is that max_time == timer per model ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would read through the entire API and manual now that you have a bit more familiarity, to know what's possible and what's not
https://automl.github.io/auto-sklearn/master/api.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been there in the previous version of the example: smac_scenario_args={"runcount_limit": 1}
The line to long errors in pre-commit can be fixed by adding The other solution is to have modules import in the |
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
The doc build failure appeared to be unrelated, so I just restarted it. However, the |
|
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
i did make format pre-commit works now |
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
…le contains only one model. Therefore we reduced the problem complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good now. I believe the example could be drastically simplified by restricting the categories via the argument categories
to the function load 20 newsgroups.
…le contains only one model. Therefore we reduced the problem complexity
…le contains only one model. Therefore we reduced the problem complexity
cats = ["comp.sys.ibm.pc.hardware", "rec.sport.baseball"] | ||
X_train, y_train = fetch_20newsgroups( | ||
subset="train", # select train set | ||
shuffle=True, # shuffle the data set for unbiased validation results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle=True, # shuffle the data set for unbiased validation results | |
shuffle=True, # shuffle the data set for unbiased validation results |
) # load this two columns separately as numpy array | ||
|
||
X_test, y_test = fetch_20newsgroups( | ||
subset="test", # select test set for unbiased evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subset="test", # select test set for unbiased evaluation | |
subset="test", # select test set for unbiased evaluation |
# set the time high enough text preprocessing can create many new features | ||
time_left_for_this_task=300, | ||
per_run_time_limit=30, | ||
time_left_for_this_task=60, # absolute time limit for fitting the ensemble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_left_for_this_task=60, # absolute time limit for fitting the ensemble | |
time_left_for_this_task=60, |
…le contains only one model. Therefore we reduced the problem complexity
* rename "ngram_range" to "ngram_upper_bound" this includes renaming it in all *csv and *json files for metalearning * rename "ngram_range" to "ngram_upper_bound" this includes renaming it in all *csv and *json files for metalearning * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * limit 20NG to 5 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 5 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 2 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 2 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity
handle the following issue
#1373 (comment)
this commit fixes the first 3 bullet points on the to do list.
rename hyperparameter "ngram_range" --> "ngram_upper_bound"
this includes changing all *csv and *json files
Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset
import in example_text_preprocessing.py to long, but i can not come up with a good solution