-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch suggest in ensemble backends #677
Conversation
Make the prediction on the batch in one call of the NN model
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #677 +/- ##
==========================================
+ Coverage 99.56% 99.57% +0.01%
==========================================
Files 87 87
Lines 6158 6157 -1
==========================================
Hits 6131 6131
+ Misses 27 26 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I gave a couple of suggestions for improvement but they could also be done separately.
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
I rerun the NN ensemble suggest and eval timings and updated the results table after the last change: it seemed to give a few seconds improvement more. |
This adds the
_suggest_batch
method to ensemble backends.In all ensemble backends (simple, PAV, NN) the suggestions from the source projects are fetched by using batched suggest calls (this is on project level, so whether a backend actually uses batched suggest depends on the backend).
In case of NN ensemble the prediction (now via
_merge_hit_sets_from_sources
) is performed on the whole batch of the base suggestions in one call of the NN model. The prediction is made using the model's__call__()
method instead ofpredict()
as it is the recommended way for "small numbers of inputs that fit in one batch" and it should offer better performance; this should also fix #674.Also the EnsembleOptimizer is made to use batched suggest. Quickly testing the
hyperopt
command on an ensemble project gives very similar weights and best NDCG scores before and after this PR.This PR improves somewhat the performance of NN ensemble suggest functionality while the results remain (only nearly?) identical (I think there were small differences in suggestion scores on my laptop). I think most of the increase comes from using
__call__
of the model instead ofpredict
.The below results are from runs at kj-kk using the current Finto AI YSO NN ensemble model (having MLLM, fastText and Omikuji base projects).
suggest
Targeting 6 times
tests/corpora/archaeology/fulltext/*.txt
:eval
Targeting 200 documents from
kirjaesittelyt2021/yso/fin/test
:With 1 job
With 4 jobs