Update dependencies v1.0 #726

juhoinkinen · 2023-08-14T13:38:25Z

Updates most of the outdated dependencies, and pins Flask and rdflib versions more tightly.

Exceptions to updating to most-recent releases:

Flask is pinned to 2.2.* instead 2.3.* due to the requirement by Connexion v2.14.2
rdflib is pinned 6.3.* instead of 7.* due to the requirement by stwfsapy
Updating to numpy 1.25.* would require Python 3.9-3.11
Updating to scipy 1.11.* would require Python 3.9-3.11
~~Updating to Optuna 3.3.0 (from 2.10.1) is left to wait for the possible implementation changes in Annif (see issue Update to Optuna 3.x to get rid of deprecation message #532)~~

Also closes #697 and #532.

codecov · 2023-08-14T13:42:41Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (40cc2fd) 99.67% compared to head (e037b78) 99.67%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #726   +/-   ##
=======================================
  Coverage   99.67%   99.67%           
=======================================
  Files          89       89           
  Lines        6397     6397           
=======================================
  Hits         6376     6376           
  Misses         21       21

Files Changed	Coverage Δ
annif/backend/ensemble.py	`100.00% <ø> (ø)`
annif/backend/svc.py	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

juhoinkinen · 2023-08-14T14:05:04Z

When updating to scikit-learn v1.3.0 old MLLM models cease to work:

...

  File "/home/local/jmminkin/git/Annif/annif/backend/backend.py", line 142, in suggest
    self.initialize()
  File "/home/local/jmminkin/git/Annif/annif/backend/mllm.py", line 119, in initialize
    self._model = self._load_model()
  File "/home/local/jmminkin/git/Annif/annif/backend/mllm.py", line 102, in _load_model
    return MLLMModel.load(path)
  File "/home/local/jmminkin/git/Annif/annif/lexical/mllm.py", line 366, in load
    return joblib.load(filename)
  File "/home/local/jmminkin/.cache/pypoetry/virtualenvs/annif-ul-EXdhi-py3.8/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/local/jmminkin/.cache/pypoetry/virtualenvs/annif-ul-EXdhi-py3.8/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.8/pickle.py", line 1212, in load
    dispatch[key[0]](self)
  File "/home/local/jmminkin/.cache/pypoetry/virtualenvs/annif-ul-EXdhi-py3.8/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 402, in load_build
    Unpickler.load_build(self)
  File "/usr/lib/python3.8/pickle.py", line 1705, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 714, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1418, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

osma · 2023-08-14T14:38:39Z

Good catch re: scikit-learn and MLLM. There's not much we can do about it, I'm afraid. I think it's better to update now, before 1.0, instead of postponing the inevitable. But it has to be mentioned in the release notes.

The scikit-learn documentation on Model Persistence has some notes about compatibility between different environments and versions. It mentions the PMML and ONNX formats that could possibly be more durable than serializing sklearn models directly via joblib or pickle, as we currently do in MLLM. But that would be a whole new investigation. I think we should just consider this an unfortunate situation that we couldn't avoid.

Old method trial.suggest_uniform() is deprecated

juhoinkinen · 2023-08-15T08:56:43Z

Good catch re: scikit-learn and MLLM. There's not much we can do about it, I'm afraid. I think it's better to update now, before 1.0, instead of postponing the inevitable. But it has to be mentioned in the release notes.

The scikit-learn documentation on Model Persistence has some notes about compatibility between different environments and versions. It mentions the PMML and ONNX formats that could possibly be more durable than serializing sklearn models directly via joblib or pickle, as we currently do in MLLM. But that would be a whole new investigation. I think we should just consider this an unfortunate situation that we couldn't avoid.

Also old stwfsa models won't work with updated skikit-learn, the error message is the same as for MLLM models.

osma · 2023-08-16T06:41:02Z

I'm seeing this new TensorFlow/Keras warning for test_backend_nn_ensemble.py:

UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')

It seems there is a new Keras format (.keras) available, which is nowadays the recommended one for saving Keras models. We are using the legacy HDF5 (.h5) format:

Annif/annif/backend/nn_ensemble.py

Line 100 in 40cc2fd

MODEL_FILE = "nn-model.h5"

What should we do about this for Annif 1.0?

Nothing, just ignore the warning for now
Switch to the Keras format in a simplistic way (basically just switching the file extension to .keras on the above line)
Switch to .keras but additionally add fallback code that can load existing .h5 models as well
Same as 3, but additionally add a deprecation warning that the fallback support will be removed in Annif 1.1.

I don't like 1, because it postpones an inevitable problem. Option 2 breaks compatibility with previously trained models, but provides a fresh start. Option 3 adds a few lines of extra code (and tests), while option 4 is more work but also "promises" to remove that extra code in the future.

sonarqubecloud · 2023-08-16T07:01:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

juhoinkinen · 2023-08-16T07:31:01Z

What should we do about this for Annif 1.0?
1. Nothing, just ignore the warning for now

2. Switch to the Keras format in a simplistic way (basically just switching the file extension to `.keras` on the above line)

3. Switch to `.keras` but additionally add fallback code that can load existing `.h5` models as well

4. Same as 3, but additionally add a deprecation warning that the fallback support will be removed in Annif 1.1.
I don't like 1, because it postpones an inevitable problem. Option 2 breaks compatibility with previously trained models, but provides a fresh start. Option 3 adds a few lines of extra code (and tests), while option 4 is more work but also "promises" to remove that extra code in the future.

Option 3 seems best to me. Better promise as little as possible.

osma · 2023-08-16T09:48:52Z

Option 3 seems best to me. Better promise as little as possible.

I initially implemented option 3 in PR #730, but the fallback code (especially the unit test) got messy and then I realized that it's not much use supporting old pre-1.0 NN ensemble models when MLLM and STWFSA models will break anyway due to changes in scikit-learn.

juhoinkinen added 8 commits August 14, 2023 11:42

Pin to Flask 2.2.*

82240e4

Upgrade to Flask-cors 4.0.*

f0ac918

Upgrade to gunicorn 21.2.*

b7aeac4

Upgrade to joblib 1.3.*

c13b11c

Upgrade to scikit-learn 1.3.*

5cecc47

Upgrade to spacy 3.6.*

4b5d659

Upgrade to tensorflow-cpu 2.13.*

0f76e46

Pin to rdflib 6.3.*

39e783a

juhoinkinen added the maintenance label Aug 14, 2023

juhoinkinen added this to the 1.0 milestone Aug 14, 2023

juhoinkinen added 2 commits August 15, 2023 09:44

Upgrade to optuna 3.3.*

8d03a91

Switch to use optuna method trial.suggest_float()

cc0bcd8

Old method trial.suggest_uniform() is deprecated

juhoinkinen linked an issue Aug 15, 2023 that may be closed by this pull request

Update to Optuna 3.x to get rid of deprecation message #532

Closed

juhoinkinen marked this pull request as ready for review August 15, 2023 10:02

This was referenced Aug 15, 2023

Allow rdflib versions 7.*? zbw/stwfsapy#48

Closed

Python 3.11 support #727

Merged

fix scikit-learn FutureWarning about LinearSVC dual parameter

e037b78

osma mentioned this pull request Aug 16, 2023

Switch to Keras v3 save format for nn_ensemble #730

Merged

juhoinkinen merged commit 1c30cd5 into main Aug 16, 2023

juhoinkinen deleted the update-dependencies-v1.0 branch August 16, 2023 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependencies v1.0 #726

Update dependencies v1.0 #726

juhoinkinen commented Aug 14, 2023 •

edited

Loading

codecov bot commented Aug 14, 2023 •

edited

Loading

juhoinkinen commented Aug 14, 2023

osma commented Aug 14, 2023

juhoinkinen commented Aug 15, 2023

osma commented Aug 16, 2023

sonarqubecloud bot commented Aug 16, 2023

juhoinkinen commented Aug 16, 2023

osma commented Aug 16, 2023

Update dependencies v1.0 #726

Update dependencies v1.0 #726

Conversation

juhoinkinen commented Aug 14, 2023 • edited Loading

codecov bot commented Aug 14, 2023 • edited Loading

Codecov Report

juhoinkinen commented Aug 14, 2023

osma commented Aug 14, 2023

juhoinkinen commented Aug 15, 2023

osma commented Aug 16, 2023

sonarqubecloud bot commented Aug 16, 2023

juhoinkinen commented Aug 16, 2023

osma commented Aug 16, 2023

juhoinkinen commented Aug 14, 2023 •

edited

Loading

codecov bot commented Aug 14, 2023 •

edited

Loading