Change HP Name & Include Text example #1410

Louquinze · 2022-02-19T23:27:56Z

handle the following issue
#1373 (comment)

this commit fixes the first 3 bullet points on the to do list.

rename hyperparameter "ngram_range" --> "ngram_upper_bound"
this includes changing all *csv and *json files
Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset

import in example_text_preprocessing.py to long, but i can not come up with a good solution

… in all *csv and *json files for metalearning

codecov · 2022-02-21T14:56:13Z

Codecov Report

Merging #1410 (bac27b9) into development (00b8e6e) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           development    #1410      +/-   ##
===============================================
- Coverage        84.52%   84.51%   -0.01%     
===============================================
  Files              146      146              
  Lines            11283    11283              
  Branches          1929     1929              
===============================================
- Hits              9537     9536       -1     
- Misses            1231     1232       +1     
  Partials           515      515

automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution

mfeurer

@eddiebergman any idea how to fix PEP8 here?

examples/40_advanced/example_text_preprocessing.py

mfeurer · 2022-02-23T08:17:34Z

examples/40_advanced/example_text_preprocessing.py

+# ==========================
+
+automl = autosklearn.classification.AutoSklearnClassifier(
+    # set the time high enough text preprocessing can create many new features


Does 20 newsgroup work in the setting on the left? That would be preferable for running this example in the github actions.

Maybe we should also use a smaller dataset? You can use the following script to scan on OpenML for datasets containing string data:

import openml datasets = openml.datasets.list_datasets() for did in datasets: try: dataset = openml.datasets.get_dataset(did, download_data=False, download_qualities=False) for feat in dataset.features: if dataset.features[feat].data_type == 'string': print(did, dataset.name) break except Exception as e: print(e) continue

the example yields ~80% acc. on the test set. Selecting random would be 5% for 20 labels. Therefore i would say that the example works. But it also runs 300 sec. which are 5 min. So if that is to long i can search another dataset.

Sorry, I meant, would the example work when you restrict it to use only a single configuration?

is there a parameter for setting autosklearn to it or is that max_time == timer per model ?

I would read through the entire API and manual now that you have a bit more familiarity, to know what's possible and what's not
https://automl.github.io/auto-sklearn/master/api.html

It has been there in the previous version of the example: smac_scenario_args={"runcount_limit": 1}

examples/40_advanced/example_text_preprocessing.py

eddiebergman · 2022-02-23T10:38:58Z

The line to long errors in pre-commit can be fixed by adding # noqa: E501 to the end of those lines. I was rethinking that perhaps 100 line length is fine but that's a seperate thing to discuss, it wouldn't prevent these errors anyway.

The other solution is to have modules import in the __init__ so the imports aren't this long.

automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.

mfeurer · 2022-02-24T16:56:07Z

The doc build failure appeared to be unrelated, so I just restarted it. However, the pre-commit fails right now, could you please have a look into this?

eddiebergman · 2022-02-24T17:14:22Z

The doc build failure appeared to be unrelated, so I just restarted it. However, the pre-commit fails right now, could you please have a look into this?

make format for the formatting. This bug with the leadboard as shown in the docs is triggered by this example. While not directly related to the example, it often occurs when no models are found as the id's of models get messed up.

automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.

Louquinze · 2022-02-24T17:22:13Z

The doc build failure appeared to be unrelated, so I just restarted it. However, the pre-commit fails right now, could you please have a look into this?

make format for the formatting. This bug with the leadboard as shown in the docs is triggered by this example. While not directly related to the example, it often occurs when no models are found as the id's of models get messed up.

i did make format pre-commit works now

automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.

examples/40_advanced/example_text_preprocessing.py

automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.

…le contains only one model. Therefore we reduced the problem complexity

mfeurer

That looks good now. I believe the example could be drastically simplified by restricting the categories via the argument categories to the function load 20 newsgroups.

…le contains only one model. Therefore we reduced the problem complexity

mfeurer · 2022-03-01T16:42:35Z

examples/40_advanced/example_text_preprocessing.py

+cats = ["comp.sys.ibm.pc.hardware", "rec.sport.baseball"]
+X_train, y_train = fetch_20newsgroups(
+    subset="train",  # select train set
+    shuffle=True,  # shuffle the data set for unbiased validation results


Suggested change

shuffle=True, # shuffle the data set for unbiased validation results

shuffle=True, # shuffle the data set for unbiased validation results

mfeurer · 2022-03-01T16:42:44Z

examples/40_advanced/example_text_preprocessing.py

+)  # load this two columns separately as numpy array
+
+X_test, y_test = fetch_20newsgroups(
+    subset="test",  # select test set for unbiased evaluation


Suggested change

subset="test", # select test set for unbiased evaluation

subset="test", # select test set for unbiased evaluation

examples/40_advanced/example_text_preprocessing.py

mfeurer · 2022-03-01T16:44:12Z

examples/40_advanced/example_text_preprocessing.py

-    # set the time high enough text preprocessing can create many new features
-    time_left_for_this_task=300,
-    per_run_time_limit=30,
+    time_left_for_this_task=60,  # absolute time limit for fitting the ensemble


Suggested change

time_left_for_this_task=60, # absolute time limit for fitting the ensemble

time_left_for_this_task=60,

examples/40_advanced/example_text_preprocessing.py

…le contains only one model. Therefore we reduced the problem complexity

* rename "ngram_range" to "ngram_upper_bound" this includes renaming it in all *csv and *json files for metalearning * rename "ngram_range" to "ngram_upper_bound" this includes renaming it in all *csv and *json files for metalearning * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * handle the following issue #1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24. * limit 20NG to 5 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 5 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 2 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity * limit 20NG to 2 labels. automl.leaderboard has problems if the ensamble contains only one model. Therefore we reduced the problem complexity

Louquinze and others added 3 commits February 19, 2022 23:28

rename "ngram_range" to "ngram_upper_bound" this includes renaming it…

45b3b7e

… in all *csv and *json files for metalearning

rename "ngram_range" to "ngram_upper_bound" this includes renaming it…

e3ef23f

… in all *csv and *json files for metalearning

Merge branch 'automl:development' into development

666086d

Louquinze changed the title ~~Development~~ Change HP Name & Include Text example Feb 21, 2022

Louquinze requested review from eddiebergman and mfeurer February 21, 2022 16:20

mfeurer reviewed Feb 23, 2022

View reviewed changes

eddiebergman reviewed Feb 23, 2022

View reviewed changes

examples/40_advanced/example_text_preprocessing.py Outdated Show resolved Hide resolved

examples/40_advanced/example_text_preprocessing.py Outdated Show resolved Hide resolved

mfeurer reviewed Feb 25, 2022

View reviewed changes

examples/40_advanced/example_text_preprocessing.py Outdated Show resolved Hide resolved

examples/40_advanced/example_text_preprocessing.py Outdated Show resolved Hide resolved

Louquinze added 4 commits February 25, 2022 13:47

limit 20NG to 5 labels. automl.leaderboard has problems if the ensamb…

f96b758

…le contains only one model. Therefore we reduced the problem complexity

mfeurer requested changes Mar 1, 2022

View reviewed changes

limit 20NG to 5 labels. automl.leaderboard has problems if the ensamb…

38e5e2f

…le contains only one model. Therefore we reduced the problem complexity

Louquinze requested a review from mfeurer March 1, 2022 16:10

limit 20NG to 2 labels. automl.leaderboard has problems if the ensamb…

93d2164

…le contains only one model. Therefore we reduced the problem complexity

mfeurer reviewed Mar 1, 2022

View reviewed changes

limit 20NG to 2 labels. automl.leaderboard has problems if the ensamb…

bac27b9

…le contains only one model. Therefore we reduced the problem complexity

Louquinze requested a review from mfeurer March 1, 2022 17:20

mfeurer approved these changes Mar 2, 2022

View reviewed changes

mfeurer merged commit ab5c016 into automl:development Mar 2, 2022

github-actions bot pushed a commit that referenced this pull request Mar 2, 2022

Lukas Strack: Change HP Name & Include Text example (#1410)

3b4b40e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change HP Name & Include Text example #1410

Change HP Name & Include Text example #1410

Louquinze commented Feb 19, 2022 •

edited

Loading

codecov bot commented Feb 21, 2022 •

edited

Loading

mfeurer left a comment

mfeurer Feb 23, 2022

mfeurer Feb 23, 2022 •

edited

Loading

Louquinze Feb 24, 2022

mfeurer Feb 24, 2022

Louquinze Feb 25, 2022

eddiebergman Feb 25, 2022

mfeurer Mar 1, 2022

eddiebergman commented Feb 23, 2022

mfeurer commented Feb 24, 2022

eddiebergman commented Feb 24, 2022 •

edited

Loading

Louquinze commented Feb 24, 2022

mfeurer left a comment

mfeurer Mar 1, 2022

mfeurer Mar 1, 2022

mfeurer Mar 1, 2022

	shuffle=True, # shuffle the data set for unbiased validation results
	shuffle=True, # shuffle the data set for unbiased validation results

	subset="test", # select test set for unbiased evaluation
	subset="test", # select test set for unbiased evaluation

	time_left_for_this_task=60, # absolute time limit for fitting the ensemble
	time_left_for_this_task=60,

Change HP Name & Include Text example #1410

Change HP Name & Include Text example #1410

Conversation

Louquinze commented Feb 19, 2022 • edited Loading

codecov bot commented Feb 21, 2022 • edited Loading

Codecov Report

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddiebergman commented Feb 23, 2022

mfeurer commented Feb 24, 2022

eddiebergman commented Feb 24, 2022 • edited Loading

Louquinze commented Feb 24, 2022

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Louquinze commented Feb 19, 2022 •

edited

Loading

codecov bot commented Feb 21, 2022 •

edited

Loading

mfeurer Feb 23, 2022 •

edited

Loading

eddiebergman commented Feb 24, 2022 •

edited

Loading