Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing SPARQL pre/post-processing scripts by ROBOT pluggable commands #1169

Open
gouttegd opened this issue Jan 13, 2025 · 3 comments
Assignees

Comments

@gouttegd
Copy link
Contributor

One of the steps that is performed when preparing an import module is the execution of the src/sparql/preprocess-module.ru SPARQL query, which simply sets a dc:source ontology annotation:

INSERT { 
    ?ontology dc:source ?version_iri .
}

WHERE {
  ?ontology rdf:type owl:Ontology ;
        owl:versionIRI ?version_iri .
}

Running a SPARQL query with ROBOT requires (1) converting the entire ontology in memory from the OWLAPI model to the JENA model, (2) actually running the query on the JENA model, (3) serialising the JENA model to Turtle, and (4) parsing the serialised Turtle back into a OWLAPI model. Doing all of that for a task that could be done in 3 lines of Java code is a huge waste of resources.

In fact, the cost of that process is a big reason why some import modules need to be marked as “large”, so that we can selectively avoid that step for those modules:

$(ROBOT) \
    {% if ont.is_large %}extract -i $< -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT \
    {% else %}query -i $< --update ../sparql/preprocess-module.ru extract -T $(IMPORTDIR)/{{ ont.id }}_terms_combined.txt --copy-ontology-annotations true --force true --method BOT {% endif %}

A simple ROBOT plugin could easily replace the SPARQL query. This would make the preprocessing much faster for all modules (since we avoid the whole roundtrip to JENA and back), and would remove the need for treating “large” imports differently, which would in turn make the Makefile template easier to read and maintain.

Likewise for the post-processing step, which simply strips all ontology annotations except the dc:source annotation.

@gouttegd gouttegd self-assigned this Jan 13, 2025
@matentzn
Copy link
Contributor

I fully agree, this would be great. Any thoughts on avoiding proliferation of robot plugins? maybe an ODK - utils one for such cases?

@gouttegd
Copy link
Contributor Author

maybe an ODK - utils one for such cases?

That’s what I was thinking of, yes. Since it would be used in standard ODK workflows, a odk plugin would definitely be appropriate.

And such a plugin, once it exists, could become the landing site for other pluggable commands that were initially created elsewhere but were found to be useful enough to be re-used across several repositories (such as FlyBase’s rewrite-definition, initially written for FlyBase but is now also used by CL).

@matentzn
Copy link
Contributor

Sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants