Batch suggest in ensemble backends #677

juhoinkinen · 2023-03-03T10:37:57Z

This adds the _suggest_batch method to ensemble backends.

In all ensemble backends (simple, PAV, NN) the suggestions from the source projects are fetched by using batched suggest calls (this is on project level, so whether a backend actually uses batched suggest depends on the backend).

In case of NN ensemble the prediction (now via _merge_hit_sets_from_sources) is performed on the whole batch of the base suggestions in one call of the NN model. The prediction is made using the model's __call__() method instead of predict() as it is the recommended way for "small numbers of inputs that fit in one batch" and it should offer better performance; this should also fix #674.

Also the EnsembleOptimizer is made to use batched suggest. Quickly testing the hyperopt command on an ensemble project gives very similar weights and best NDCG scores before and after this PR.

This PR improves somewhat the performance of NN ensemble suggest functionality while the results remain (only nearly?) identical (I think there were small differences in suggestion scores on my laptop). I think most of the increase comes from using __call__ of the model instead of predict.

The below results are from runs at kj-kk using the current Finto AI YSO NN ensemble model (having MLLM, fastText and Omikuji base projects).

suggest

Targeting 6 times tests/corpora/archaeology/fulltext/*.txt:

	user time	wall time	max rss
before (master)	152.59	2:35.10	13755484
after (PR)	135.19	2:17.27	13757912

eval

Targeting 200 documents from kirjaesittelyt2021/yso/fin/test:

With 1 job

	user time	wall time	max rss	F1@5
before (master)	174.79	2:57.70	13816688	0.4431
after (PR)	156.12	2:36.77	13796932	0.4431

With 4 jobs

	user time	wall time	max rss	F1@5
before (master)	188.21	1:45.61	13639148	0.4431
after (PR)	172.5	1:43.78	13606508	0.4431

Make the prediction on the batch in one call of the NN model

codecov · 2023-03-03T10:45:33Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (3e8f42f) 99.56% compared to head (f280342) 99.57%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   99.56%   99.57%   +0.01%     
==========================================
  Files          87       87              
  Lines        6158     6157       -1     
==========================================
  Hits         6131     6131              
+ Misses         27       26       -1

Impacted Files	Coverage Δ
annif/backend/ensemble.py	`100.00% <100.00%> (ø)`
annif/backend/nn_ensemble.py	`100.00% <100.00%> (+0.70%)`	⬆️
annif/suggestion.py	`100.00% <100.00%> (ø)`
annif/util.py	`98.57% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

osma

Looks good, I gave a couple of suggestions for improvement but they could also be done separately.

annif/backend/ensemble.py

sonarqubecloud · 2023-03-07T09:44:44Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

juhoinkinen · 2023-03-07T10:58:46Z

I rerun the NN ensemble suggest and eval timings and updated the results table after the last change: it seemed to give a few seconds improvement more.

juhoinkinen added 3 commits March 3, 2023 11:32

Support _suggest_batch operation in NN ensemble backend

16a58bb

Make the prediction on the batch in one call of the NN model

Use model's __call__ method for predictions

ba733c2

Use batch suggest calls to base projects

f46c2ad

juhoinkinen added the enhancement label Mar 3, 2023

juhoinkinen added this to the 0.61 milestone Mar 3, 2023

juhoinkinen added 3 commits March 3, 2023 13:39

Implement _suggest_batch in ensemble backend

1bfc738

Use batched suggest calls in EnsembleOptimizer

5e39256

Fix for forgotten function rename

5bdfb71

juhoinkinen changed the title ~~Batch suggest in NN ensemble~~ Batch suggest in ensemble backends Mar 3, 2023

Make docstring & loop var more informative

42e1afc

juhoinkinen marked this pull request as ready for review March 6, 2023 10:40

osma approved these changes Mar 6, 2023

View reviewed changes

annif/backend/ensemble.py Outdated Show resolved Hide resolved

annif/backend/ensemble.py Outdated Show resolved Hide resolved

Turn WeightedSuggestion to WeightedSuggestionsBatch

f280342

juhoinkinen merged commit eb437a8 into master Mar 7, 2023

juhoinkinen deleted the batching-in-nn-ensemble-suggestions branch March 7, 2023 10:56

juhoinkinen mentioned this pull request Mar 10, 2023

Switch default git branch to main #679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch suggest in ensemble backends #677

Batch suggest in ensemble backends #677

juhoinkinen commented Mar 3, 2023 •

edited

Loading

codecov bot commented Mar 3, 2023 •

edited

Loading

osma left a comment

sonarqubecloud bot commented Mar 7, 2023

juhoinkinen commented Mar 7, 2023

Batch suggest in ensemble backends #677

Batch suggest in ensemble backends #677

Conversation

juhoinkinen commented Mar 3, 2023 • edited Loading

suggest

eval

With 1 job

With 4 jobs

codecov bot commented Mar 3, 2023 • edited Loading

Codecov Report

osma left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Mar 7, 2023

juhoinkinen commented Mar 7, 2023

juhoinkinen commented Mar 3, 2023 •

edited

Loading

codecov bot commented Mar 3, 2023 •

edited

Loading