A version control system for RDF datasets
Explore the docs »
Changelog
·
Report Bug
·
Request Feature
·
Discussions
Ontogen is a version control solution. While it takes a lot of inspiration from Git, it is something very different, however. Whereas Git, Mercurial, SVN etc. are all version control systems for source code, Ontogen has a different versioning subject: datasets. These are datasets of data that were created or generated by someone or something, then reviewed by someone or something, then edited by someone or something and so on. So, while we can say Git, Mercurial, SVN are all SCM (Source-Control-Management) solutions, Ontogen would be a DCM (Data-Control-Management) solution.
But it's not about datasets in general. The versioned subject of Ontogen are the graphs of an RDF dataset that are stored in a SPARQL-star-compliant triple store. In an Ontogen repository on such a store, however, we have a similar situation as in Git: instead of a single .git
directory containing the version history of the code, we have a single separated named graph containing the version history of the dataset.
Ontogen takes a holistic approach to RDF data versioning by considering not just the syntactical and semantical levels, but also the pragmatic layer, the acts of producing the data and changing them.
Key features of Ontogen include:
- Provenance metadata generation for changes in the triple store
- Incorporation of speech act utterances of RDF statements
- Integration with PROV and DCAT models
While Ontogen aims to provide a robust version control system for RDF datasets, it's important to note its current limitations:
- Single Graph Support: The current version only supports versioning of individual graphs within an RDF dataset. Versioning of multi-graph datasets is not yet implemented.
- Cryptic Graph Names: Due to the current implementation, graph names are automatically generated UUID URIs and can not be changed.
- Limited Configuration Updates: There's currently no way to update and sync repository metadata and configuration from the configuration files in the file system with the respective copy in the store, after the initial repository setup.
- Performance with Large Datasets: Ontogen is not yet suitable for versioning large datasets. Adding substantial amounts of data in a single commit can hit query size limits in some triple stores. Additionally, certain queries become prohibitively slow with very large datasets (be sure to use the latest version to at least prevent timeouts).
I'm actively working on addressing these limitations in future versions. The first three points will be addressed during the current follow-up funding period by the NLnet Foundation. For now, Ontogen is best suited for smaller to medium-sized datasets and experimental use.
Note for end users: This is the core library, which can be used to integrate Ontogen into your Elixir apps. If you're looking to use Ontogen as a command-line tool, please check out the Ontogen CLI repository.
To use Ontogen, you'll need:
- Elixir v1.15+ & Erlang/OTP v23+
- A SPARQL-compatible triple store (currently only Fuseki and Oxigraph are officially supported)
Add Ontogen to your list of dependencies in mix.exs
:
def deps do
[
{:ontogen, "~> 0.1"}
]
end
Then run:
$ mix deps.get
Here's a basic example of how to use Ontogen with its CLI:
$ mkdir example
$ cd example
$ og init --adapter Oxigraph
Initialized empty Ontogen repository in /Users/JohnDoe/example
$ og setup
Set up Ontogen repository
$ og add data.ttl
$ og commit --message "Initial commit"
[(root-commit) 6fc09c94768204983d0409d28e0796ec3f17cef46e57c5cb1248424d3922040d] Initial commit
3 insertions, 0 deletions, 0 overwrites
$ og log --changes
ec8108e3f4 - Initial commit (just now) <John Doe john.doe@example.com>
<http://www.example.org/employee38>
+ <http://www.example.org/familyName> "Smith" ;
+ <http://www.example.org/firstName> "John" ;
+ <http://www.example.org/jobTitle> "Assistant Designer" .
For more examples, please refer to the User Guide
- Support for multiple graphs in an RDF dataset
- Implement branching and merging capabilities
- Support for more triple stores
- ...
See the open issues for a full list of proposed features (and known issues).
Marcel Otto - @marcelotto@mastodon.social - @MarcelOttoDE - marcelotto@gmx.de
This project is funded through NGI Assure, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program.
JetBrains supports the project with complimentary access to its development environments.
Distributed under the MIT License. See LICENSE.md
for more information.