Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI (AutoML) feature not working on recent builds - hash mismatch #306

Open
JDRomano2 opened this issue Jan 19, 2021 · 9 comments
Open

AI (AutoML) feature not working on recent builds - hash mismatch #306

JDRomano2 opened this issue Jan 19, 2021 · 9 comments
Assignees

Comments

@JDRomano2
Copy link
Contributor

Recent changes seem to have 'broken' the AI feature. Regular ML algorithms can be run, but the "AI" button in the upper right corner of each database on the "Databases" dashboard seems to be permanently inactive.

I have tested this on both a MacOS laptop and on a Raspberry Pi 400. Interestingly, although the AI feature doesn't work on either of them, an informative error message is only given on the Raspberry Pi.

Excerpt from the Raspberry PI logs:

...
lab_1      | 1|ai     | surprise_recommenders: INFO: setting training data...
lab_1      | 1|ai     | base: INFO: updating hash_2_param...
lab_1      | 1|ai     | base: INFO: storing parameter hash...
lab_1      | 1|ai     | surprise_recommenders: INFO: append and drop dupes
lab_1      | 1|ai     | surprise_recommenders: INFO: load_from_df
lab_1      | 1|ai     | surprise_recommenders: ERROR: the results_df hash from the pickle is different
lab_1      | 1|ai     | Traceback (most recent call last):
lab_1      | 1|ai     |   File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
lab_1      | 1|ai     |     "__main__", mod_spec)
lab_1      | 1|ai     |   File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
lab_1      | 1|ai     |     exec(code, run_globals)
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 662, in <module>
lab_1      | 1|ai     |     main()
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 631, in main
lab_1      | 1|ai     |     term_condition=args.TERM_COND, max_time=args.MAX_TIME)
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 186, in __init__
lab_1      | 1|ai     |     self.initialize_recommenders(rec_class) # set self.rec_engines
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 247, in initialize_recommenders
lab_1      | 1|ai     |     self.rec_engines[pred_type] = rec_class(**recArgs)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 126, in __init__
lab_1      | 1|ai     |     random_state=random_state)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/base.py", line 165, in __init__
lab_1      | 1|ai     |     serialized_rec_filename)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/base.py", line 195, in _train_empty_rec
lab_1      | 1|ai     |     self.load(self.serialized_rec_path, knowledgebase_results)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 175, in load
lab_1      | 1|ai     |     source='knowledgebase')
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 162, in _reconstruct_training_data
lab_1      | 1|ai     |     raise ValueError(error_msg)
lab_1      | 1|ai     | ValueError: the results_df hash from the pickle is different
lab_1      | PM2      | App [ai:1] exited with code [1] via signal [SIGINT]
lab_1      | PM2      | App [ai:1] starting in -fork mode-
lab_1      | PM2      | App [ai:1] online
lab_1      | 1|ai     | ======= Penn AI =======
lab_1      | 0|lab    | POST /api/projects 200 - - 4.529 ms
lab_1      | 0|lab    | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]')
lab_1      | 0|lab    | {}
lab_1      | 0|lab    | =socketServer:recommenderStatusUpdated(initializing)
lab_1      | 0|lab    | POST /api/recommender/status 200 54 - 4.591 ms
lab_1      | 1|ai     | ai: INFO: loading pmlb knowledgebase
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_default_knowledgebases('True', 'data/knowledgebases/user/results', 'data/knowledgebases/user/metafeatures'
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_knowledgebase('['data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz', 'data/knowledgebases/pmlb_regression_results.pkl.gz']', ['data/knowledgebases/pmlb_classification_metafeatures.csv.gz', 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz']', '')
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: returning 52249 results from data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/pmlb_regression_results.pkl.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: concatenating results....
lab_1      | 1|ai     | knowledgebase_utils: INFO: load metafeatures...
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_classification_metafeatures.csv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz
lab_1      | 1|ai     | ai: INFO: updating AI with classification knowledgebase (52249 results)
lab_1      | 1|ai     | ai: INFO: pmlb classification knowledgebase loaded
...

And similarly, from MacOS:

...
lab_1      | 1|ai     | base: WARNING: algo changing from <surprise.prediction... to <surprise.prediction...
lab_1      | 1|ai     | base: WARNING: first_fit changing from True... to False...
lab_1      | 1|ai     | base: WARNING: reader changing from <surprise.reader.Rea... to <surprise.reader.Rea...
lab_1      | 1|ai     | base: WARNING: hash_2_param changing from {'c65edfb84911c2647a... to {'c65edfb84911c2647a...
lab_1      | 1|ai     | base: WARNING: adding trainset=<surprise.trainset.T...
lab_1      | 1|ai     | base: WARNING: adding results_df_hash=9213096e6869a9a4d9ea...
lab_1      | 1|ai     | base: WARNING: adding ml_p_hash=31fa2d17c46be017c19f...
lab_1      | 1|ai     | base: INFO: updating internal state
lab_1      | 1|ai     | base: INFO: ml_p hashes match
lab_1      | 1|ai     | surprise_recommenders: INFO: setting training data...
lab_1      | 1|ai     | base: INFO: updating hash_2_param...
lab_1      | PM2      | App [ai:1] exited with code [0] via signal [SIGKILL]
lab_1      | PM2      | App [ai:1] starting in -fork mode-
lab_1      | PM2      | App [ai:1] online
lab_1      | 1|ai     | ======= Penn AI =======
lab_1      | 0|lab    | POST /api/projects 200 - - 23.081 ms
lab_1      | 0|lab    | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]')
lab_1      | 0|lab    | {}
lab_1      | 0|lab    | =socketServer:recommenderStatusUpdated(initializing)
lab_1      | 0|lab    | POST /api/recommender/status 200 54 - 15.614 ms
lab_1      | 1|ai     | ai: INFO: loading pmlb knowledgebase
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_default_knowledgebases('True', 'data/knowledgebases/user/results', 'data/knowledgebases/user/metafeatures'
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_knowledgebase('['data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz', 'data/knowledgebases/pmlb_regression_results.pkl.gz']', ['data/knowledgebases/pmlb_classification_metafeatures.csv.gz', 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz']', '')
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz)
lab_1      | 0|lab    | results:
lab_1      | 0|lab    | [ { _id: 5fe3e870e2c61a175b7b7928,
lab_1      | 0|lab    |     type: 'recommender',
lab_1      | 0|lab    |     status: 'initializing' } ]
lab_1      | 0|lab    | GET /api/recommender 201 79 - 4.873 ms
lab_1      | 1|ai     | knowledgebase_utils: INFO: returning 52249 results from data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/pmlb_regression_results.pkl.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: concatenating results....
lab_1      | 1|ai     | knowledgebase_utils: INFO: load metafeatures...
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_classification_metafeatures.csv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz
...

The Raspberry Pi logs suggest there is an issue loading the knowledge base - the hash doesn't match the dataset.

Possibly an issue caused after running BFG Repo-Cleaner, or related to Git LFS?

@JDRomano2
Copy link
Contributor Author

Steps to reproduce:

$ git clone https://github.com/epistasislab/pennai
$ cd pennai
$ cp config/ai.env-template config/ai.env
$ docker-compose build
$ docker-compose up

@lacava
Copy link
Contributor

lacava commented Jan 19, 2021

what happens when you click the AI button on Mac?

@JDRomano2
Copy link
Contributor Author

JDRomano2 commented Jan 19, 2021

@lacava The AI is grayed out and instead of a button there is a spinning grey progress wheel. This is the case for both the Mac and Raspberry Pi.

@hjwilli
Copy link
Collaborator

hjwilli commented Jan 21, 2021

@JDRomano2, for the Mac, I think we need a little more information. Could you post a larger excerpt of the log?

Then could you:

  • Try starting pennai and waiting a few minutes and seeing if anything changes (if svd recommender couldn't be loaded and is being trained, that could take a few minutes)
  • Try starting with a different recommender (in config/ai.env change AI_RECOMMENDER to "random" and restart pennai) and see if the ai button becomes active
  • Check your docker runtime memory settings. What are they currently? We recommend at least 6gb of memory.

From the logs, there might be a different issue with the RaspberryPi. A good first step for that might be for us to get the unit tests running on the pi to check they all pass.

@JDRomano2
Copy link
Contributor Author

These 3 suggestions seemed to do the trick on Mac. Therefore, this must be isolated to images running on the Pi.

I'll close this issue and continue work on the raspberrypi branch to get this up and running. As recommended, I'll focus on the unit tests. Given the constrained resources on the Pi, this may require some creative tweaking to convince everything to work correctly.

@hjwilli
Copy link
Collaborator

hjwilli commented Jan 21, 2021

Hi @JDRomano2, Excellent! Do you know which of these fixed it? Just to check, are you now able to run the SVD recommender on your Mac?

@JDRomano2
Copy link
Contributor Author

I suspect it was increasing the available RAM that did the trick. I just went in and re-enabled the SVD recommender and it still works correctly, so no problem there.

@hjwilli
Copy link
Collaborator

hjwilli commented Jan 21, 2021

Great, thanks!

@jay-m-dev
Copy link
Contributor

jay-m-dev commented Dec 22, 2022

Encountered this same issue on arm64. setting the recommender to svd displays an error when starting up Aliro. Steps to recreate:

  • On an arm64 machine, set RECOMMENDER=svd and run docker compose up

The following error is displayed:

aliro-lab-1 | 1|ai | newHash d617b188ab49492d3c37bb083a37bd31cbcf3acc077d7bd3ab697115196c617c
aliro-lab-1 | 1|ai | test_newHash: 031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406
aliro-lab-1 | 1|ai | self.results_df_hash 5a246d759bb571dbd867344ef8f282ca7b0cce46347f6db58986ffec8985eb34
aliro-lab-1 | 1|ai | newHash == self.results_df_hash False
aliro-lab-1 | 1|ai | surprise_recommenders: ERROR: the results_df hash from the pickle is different
aliro-lab-1 | 1|ai | Traceback (most recent call last):
aliro-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
aliro-lab-1 | 1|ai | "main", mod_spec)
aliro-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
aliro-lab-1 | 1|ai | exec(code, run_globals)
aliro-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 658, in
aliro-lab-1 | 1|ai | main()
aliro-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 627, in main
aliro-lab-1 | 1|ai | term_condition=args.TERM_COND, max_time=args.MAX_TIME)
aliro-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 182, in init
aliro-lab-1 | 1|ai | self.initialize_recommenders(rec_class) # set self.rec_engines
aliro-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 243, in initialize_recommenders
aliro-lab-1 | 1|ai | self.rec_engines[pred_type] = rec_class(**recArgs)
aliro-lab-1 | 1|ai | File "/appsrc/ai/recommender/surprise_recommenders.py", line 126, in init
aliro-lab-1 | 1|ai | random_state=random_state)
aliro-lab-1 | 1|ai | File "/appsrc/ai/recommender/base.py", line 165, in init
aliro-lab-1 | 1|ai | serialized_rec_filename)
aliro-lab-1 | 1|ai | File "/appsrc/ai/recommender/base.py", line 195, in _train_empty_rec
aliro-lab-1 | 1|ai | self.load(self.serialized_rec_path, knowledgebase_results)
aliro-lab-1 | 1|ai | File "/appsrc/ai/recommender/surprise_recommenders.py", line 212, in load
aliro-lab-1 | 1|ai | source='knowledgebase')
aliro-lab-1 | 1|ai | File "/appsrc/ai/recommender/surprise_recommenders.py", line 199, in _reconstruct_training_data
aliro-lab-1 | 1|ai | raise ValueError(error_msg)
aliro-lab-1 | 1|ai | ValueError: the results_df hash from the pickle is different
aliro-lab-1 | PM2 | App [ai:1] exited with code [1] via signal [SIGINT]
aliro-lab-1 | PM2 | App [ai:1] starting in -fork mode-
aliro-lab-1 | PM2 | App [ai:1] online
aliro-lab-1 | 1|ai | ======= Aliro =======
aliro-lab-1 | 0|lab | POST /api/projects 200 - - 2.737 ms
aliro-lab-1 | 0|lab | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]')
aliro-lab-1 | 0|lab | {}
aliro-lab-1 | 0|lab | POST /api/recommender/status 200 54 - 1.747 ms

@jay-m-dev jay-m-dev reopened this Dec 22, 2022
@jay-m-dev jay-m-dev changed the title AI (AutoML) feature not working on recent builds AI (AutoML) feature not working on recent builds - hash mismatch Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants