Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch suggest in ensemble backends #677

Merged
merged 8 commits into from
Mar 7, 2023

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Mar 3, 2023

This adds the _suggest_batch method to ensemble backends.

In all ensemble backends (simple, PAV, NN) the suggestions from the source projects are fetched by using batched suggest calls (this is on project level, so whether a backend actually uses batched suggest depends on the backend).

In case of NN ensemble the prediction (now via _merge_hit_sets_from_sources) is performed on the whole batch of the base suggestions in one call of the NN model. The prediction is made using the model's __call__() method instead of predict() as it is the recommended way for "small numbers of inputs that fit in one batch" and it should offer better performance; this should also fix #674.

Also the EnsembleOptimizer is made to use batched suggest. Quickly testing the hyperopt command on an ensemble project gives very similar weights and best NDCG scores before and after this PR.

This PR improves somewhat the performance of NN ensemble suggest functionality while the results remain (only nearly?) identical (I think there were small differences in suggestion scores on my laptop). I think most of the increase comes from using __call__ of the model instead of predict.

The below results are from runs at kj-kk using the current Finto AI YSO NN ensemble model (having MLLM, fastText and Omikuji base projects).

suggest

Targeting 6 times tests/corpora/archaeology/fulltext/*.txt:

user time wall time max rss
before (master) 152.59 2:35.10 13755484
after (PR) 135.19 2:17.27 13757912

eval

Targeting 200 documents from kirjaesittelyt2021/yso/fin/test:

With 1 job

user time wall time max rss F1@5
before (master) 174.79 2:57.70 13816688 0.4431
after (PR) 156.12 2:36.77 13796932 0.4431

With 4 jobs

user time wall time max rss F1@5
before (master) 188.21 1:45.61 13639148 0.4431
after (PR) 172.5 1:43.78 13606508 0.4431

@juhoinkinen juhoinkinen added this to the 0.61 milestone Mar 3, 2023
@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (3e8f42f) 99.56% compared to head (f280342) 99.57%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   99.56%   99.57%   +0.01%     
==========================================
  Files          87       87              
  Lines        6158     6157       -1     
==========================================
  Hits         6131     6131              
+ Misses         27       26       -1     
Impacted Files Coverage Δ
annif/backend/ensemble.py 100.00% <100.00%> (ø)
annif/backend/nn_ensemble.py 100.00% <100.00%> (+0.70%) ⬆️
annif/suggestion.py 100.00% <100.00%> (ø)
annif/util.py 98.57% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@juhoinkinen juhoinkinen changed the title Batch suggest in NN ensemble Batch suggest in ensemble backends Mar 3, 2023
@juhoinkinen juhoinkinen marked this pull request as ready for review March 6, 2023 10:40
Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I gave a couple of suggestions for improvement but they could also be done separately.

annif/backend/ensemble.py Outdated Show resolved Hide resolved
annif/backend/ensemble.py Outdated Show resolved Hide resolved
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 7, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen juhoinkinen merged commit eb437a8 into master Mar 7, 2023
@juhoinkinen juhoinkinen deleted the batching-in-nn-ensemble-suggestions branch March 7, 2023 10:56
@juhoinkinen
Copy link
Member Author

I rerun the NN ensemble suggest and eval timings and updated the results table after the last change: it seemed to give a few seconds improvement more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in NN ensemble backend
2 participants