Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vocabularies multilingual #600

Merged
merged 12 commits into from
Aug 8, 2022
Merged

Conversation

osma
Copy link
Member

@osma osma commented Aug 4, 2022

This PR implements #559 - making vocabularies multilingual, so that there is no need to use separate language-specific vocabulary id's such as yso-fi, yso-sv and yso-en. Instead the vocabulary id yso can be used for all projects and the loadvoc command needs to be executed just once as it will detect which languages are available in the vocabulary and load the labels in all available languages.

It's also possible to define/override the language of labels, for example to use vocab=lcsh(en) in a Finnish language project.

The changes need to be carefully tested as they are quite disruptive. Documentation (including the Annif tutorial) should be updated, mainly by stripping language suffixes from vocabulary id's in examples. However, old examples (vocabulary id's with a language suffix) should still keep working.

@osma osma added this to the 0.59 milestone Aug 4, 2022
@osma osma self-assigned this Aug 4, 2022
@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Merging #600 (b258873) into master (b6a1363) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #600      +/-   ##
==========================================
- Coverage   99.54%   99.52%   -0.02%     
==========================================
  Files          86       86              
  Lines        5653     5695      +42     
==========================================
+ Hits         5627     5668      +41     
- Misses         26       27       +1     
Impacted Files Coverage Δ
annif/cli.py 99.63% <100.00%> (ø)
annif/corpus/skos.py 100.00% <100.00%> (ø)
annif/corpus/subject.py 100.00% <100.00%> (ø)
annif/project.py 99.39% <100.00%> (ø)
annif/vocab.py 95.50% <100.00%> (-0.15%) ⬇️
tests/conftest.py 100.00% <100.00%> (ø)
tests/test_cli.py 100.00% <100.00%> (ø)
tests/test_project.py 100.00% <100.00%> (ø)
tests/test_vocab.py 100.00% <100.00%> (ø)
tests/test_vocab_skos.py 100.00% <100.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@osma osma force-pushed the issue559-multilingual-vocabularies branch from b1aa810 to 4d78bb5 Compare August 5, 2022 13:12
@osma
Copy link
Member Author

osma commented Aug 5, 2022

Rebased on current master (which now contains PR #597 that was the starting point of this branch) and force-pushed.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 5, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma osma marked this pull request as ready for review August 5, 2022 15:23
@osma
Copy link
Member Author

osma commented Aug 5, 2022

This is more or less done now, pending review.

I tried to fix the issues reported by QA tools. Code Climate still complains about load_vocabulary, but I can't figure out how to make it better. Codecov says there's one more missed line than before in annif.vocab, but I can't find it in the detailed report.

@osma osma requested a review from juhoinkinen August 5, 2022 15:25
@lgtm-com
Copy link

lgtm-com bot commented Aug 5, 2022

This pull request introduces 1 alert when merging b258873 into b6a1363 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

Copy link
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@osma
Copy link
Member Author

osma commented Aug 8, 2022

Thanks for reviewing @juhoinkinen ! I will merge this now, although we still need to do more testing before the next release, however there are other related changes to the vocabulary functionality coming up (possibly e.g. #602) and it makes sense to test them all in one go.

@osma osma merged commit 954a8d9 into master Aug 8, 2022
@osma osma deleted the issue559-multilingual-vocabularies branch August 8, 2022 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants