Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SubjectSuggestion to store subject_id - not uri, label, notation #604

Merged
merged 11 commits into from
Aug 10, 2022

Conversation

osma
Copy link
Member

@osma osma commented Aug 10, 2022

This PR is a pretty major refactoring and simplification (hopefully!) of how subject suggestions are represented internally in Annif. There are two main representations, VectorSuggestionResult and ListSuggestionResult. The latter consists of SubjectSuggestion objects, which is a named tuple that has until now contained the URI, label and notation of the suggested subject as well as its suggestion score.

This PR simplifies SubjectSuggestion so that apart from the score, it only contains a numeric subject ID instead of all the subject attributes (uri, label, notation). The subject ID can be looked up from the SubjectIndex when more information about the subject is needed for example in the CLI and REST API. Since subject IDs are already used in many backends as well as other functionality, this usually makes the code simpler. It is also now easier to convert between vector and list representations of suggestions, since both now use numeric subject IDs and there is no need to rely on a SubjectIndex for the conversions.

There are also related changes in the AnnifVocabulary and SubjectIndex classes, for example the SubjectIndex now makes use of the Subject namedtuple instead of plain tuples, which makes its API a bit cleaner.

I have not yet tested this extensively but instead I've relied on the unit tests, which have of course been modified accordingly. I would expect this to bring small improvements in efficiency.

@osma osma added this to the 0.59 milestone Aug 10, 2022
@osma osma self-assigned this Aug 10, 2022
@codecov
Copy link

codecov bot commented Aug 10, 2022

Codecov Report

Merging #604 (ce48891) into master (954a8d9) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #604      +/-   ##
==========================================
+ Coverage   99.52%   99.55%   +0.03%     
==========================================
  Files          86       86              
  Lines        5695     5673      -22     
==========================================
- Hits         5668     5648      -20     
+ Misses         27       25       -2     
Impacted Files Coverage Δ
annif/backend/backend.py 100.00% <ø> (ø)
annif/backend/mllm.py 100.00% <ø> (ø)
annif/backend/omikuji.py 97.56% <ø> (-0.06%) ⬇️
annif/backend/svc.py 100.00% <ø> (ø)
annif/backend/tfidf.py 98.79% <ø> (-0.02%) ⬇️
tests/test_eval.py 100.00% <ø> (ø)
annif/backend/dummy.py 100.00% <100.00%> (ø)
annif/backend/ensemble.py 100.00% <100.00%> (ø)
annif/backend/fasttext.py 100.00% <100.00%> (ø)
annif/backend/http.py 100.00% <100.00%> (ø)
... and 28 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 27 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma osma marked this pull request as ready for review August 10, 2022 11:20
@osma osma requested a review from juhoinkinen August 10, 2022 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants