Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore ability to use vocab language different from project language #613

Merged
merged 3 commits into from
Aug 24, 2022

Conversation

osma
Copy link
Member

@osma osma commented Aug 24, 2022

The ability to use a vocabulary language different from the project language was implemented in PR #600, but subsequently broken by mistake in PR #608. For example, it should be possible to use vocab=lcsh(en) in a project with language=fi where all documents are in Finnish but English language labels are used for LCSH concepts (which don't even have Finnish labels) both when reading corpora and outputting results.

This PR aims to restore that functionality by making sure that

  1. When reading corpora in the directory-based format, labels are compared to vocabulary labels in the vocabulary language;
  2. When performing suggest operations (CLI or REST), the labels of suggested subjects are in the vocabulary language;
  3. When writing an evaluation results file which contains subject labels, the labels will be in the vocabulary language.

Currently there are unit tests to verify item 2. above, but not 1. or 3.

Also some of the test vocabularies were renamed and repurposed to better match current needs.

@osma osma added the bug label Aug 24, 2022
@osma osma added this to the 0.59 milestone Aug 24, 2022
@osma osma self-assigned this Aug 24, 2022
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #613 (26c134d) into master (c291930) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #613   +/-   ##
=======================================
  Coverage   99.58%   99.58%           
=======================================
  Files          87       87           
  Lines        5840     5850   +10     
=======================================
+ Hits         5816     5826   +10     
  Misses         24       24           
Impacted Files Coverage Δ
annif/cli.py 99.63% <ø> (ø)
annif/rest.py 100.00% <ø> (ø)
tests/test_cli.py 100.00% <100.00%> (ø)
tests/test_config.py 100.00% <100.00%> (ø)
tests/test_project.py 100.00% <100.00%> (ø)
tests/test_rest.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@osma osma marked this pull request as ready for review August 24, 2022 07:48
@osma
Copy link
Member Author

osma commented Aug 24, 2022

I think this is good enough for now. Need to get moving with the load-vocabulary command (#602) which will likely be touching some of the same bits of code anyway.

@osma osma merged commit 576c7b7 into master Aug 24, 2022
@osma osma deleted the fix-vocab-language branch August 24, 2022 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant