Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated script to run with the SpacyNER tagger and REL linker #1226

Merged
merged 3 commits into from
Jul 10, 2022

Conversation

gsgoncalves
Copy link
Contributor

Refactored the EL script to support the updated REL code.
I followed the author's suggestions for creating a new Mention Detection pipeline.
It assumes that REL is installed via pip from their repo as instructed.
Tested on a small portion of docs00.json (first 100 lines) that is generated while pre-processing the msmarco-passage for indexation with pyserini.
This approach, as the original code, is very RAM intensive. Batching and multi-processing can be added to improve the pipeline's performance.

@lintool
Copy link
Member

lintool commented Jun 29, 2022

hey @arjenpdevries - can you get someone from your team to take a look at this?

Copy link
Collaborator

@chriskamphuis chriskamphuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that the file now consists of a mixture of spaces and tabs, could you maybe make that consistent (spaces).

The PR looks good in terms of what you did :) Thank you!

scripts/entity_linking.py Outdated Show resolved Hide resolved
…ction file. Fixed positions that refer to the original position of the mentions in the original text.
@lintool lintool merged commit f553d43 into castorini:master Jul 10, 2022
@lintool
Copy link
Member

lintool commented Jul 10, 2022

@gsgoncalves thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants