This project is part of the @thi.ng/umbrella monorepo.
Extensible Graph Format.
Striving to be both easily readable & writable for humans and machines, this line based, plain text data format and package supports:
- Definition of different types of graph based data (e.g. RDF-style or Labeled Property Graph topologies)
- Full support for cyclic references, arbitrary order (automatic forward declarations)
- Choice of inlining referenced nodes for direct access or via special node ref values
- Arbitrary property values (extensible via tagged literals and custom tag parsers a la EDN)
- Optionally prefixed node and property IDs with (also optional) auto-expansion via declared prefixes (for Linked Data use cases)
- Inclusion of sub-graphs from external files
- Loading of individual property values from referenced file paths
- Optionally GPG encrypted property values (where needed)
- Multi-line values
- Line comments
- Configurable parser behavior & syntax feature flags
- Hand-optimized parser, largely regexp free
- Configurable GraphViz DOT export
(Source for this example graph is further below)
The following parsers for tagged property values are available by default. Custom parsers can be provided via config options.
Tag | Description | Result |
---|---|---|
#base64 |
Base64 encoded binary data | Uint8Array |
#date |
Date.parse() compatible string (e.g. ISO8601) |
Date |
#file |
File path to read value from | string |
#gpg |
Calls gpg to decrypt given armored string |
string |
#hex |
hex 32bit int (no prefix) | number |
#json |
Arbitrary JSON value | any |
#list |
Whitespace separated list | string[] |
#num |
Floating point value (IEEE754) | number |
Note: In this reference implementation, the #file
and #gpg
tag parsers
are only available in NodeJS.
ALPHA - bleeding edge / work-in-progress
Search or submit any issues for this package
You're strongly encouraged to update to at least v0.4.0 to avoid the potential
of arbitrary code execution in older versions when decrypting #gpg
-tagged
property values. A security advisory will be published ASAP. A fix has been
deployed already.
(Non-exhaustive list)
- VSCode syntax highlighting
- JSON -> EGF conversion
- Async tag parsing
- URL support for
#file
tag - Tag declarations & tag parser import from URL (needs trust config opts)
-
#md
tag parser for markdown content -
#gpg
fallback behavior options
yarn add @thi.ng/egf
ES module import:
<script type="module" src="https://cdn.skypack.dev/@thi.ng/egf"></script>
For Node.js REPL:
# with flag only for < v16
node --experimental-repl-await
> const egf = await import("@thi.ng/egf");
Package sizes (gzipped, pre-treeshake): ESM: 2.85 KB
- @thi.ng/api
- @thi.ng/associative
- @thi.ng/checks
- @thi.ng/dot
- @thi.ng/errors
- @thi.ng/logger
- @thi.ng/prefixes
- @thi.ng/strings
- @thi.ng/transducers-binary
TODO - Full docs forthcoming...
- api.ts - Data structures & options
- dot.ts - Graphviz export (via @thi.ng/dot)
- parser.ts - Main parser
- tags.ts - Tagged value parsers
; file: readme.egf
; prefix declaration (optional feature)
@prefix thi: thi.ng/
; a single node/subject definition
; properties are indented
; `thi:` prefix will be expanded
thi:egf
type project
; tagged value property (here: node ref)
part-of -> thi:umbrella
status alpha
description Extensible Graph Format
url https://thi.ng/egf
creator -> toxi
; multi-line value
; read as whitespace separated list/array (via #list)
tag #list >>>
graph
extensible
format
linked-data
<<<
thi:umbrella
type project
url https://thi.ng/umbrella
creator -> toxi
toxi
type person
name Karsten Schmidt
location London
account -> toxi@twitter
account -> postspectacular@gh
toxi@twitter
type account
name @toxi
url http://twitter.com/toxi
postspectacular@gh
type account
name @postspectacular
url http://github.com/postspectacular
import { parseFile } from "@thi.ng/egf";
// enable prefix expansion in parser
const graph = parseFile("readme.egf", { opts: { prefixes: true } }).nodes;
console.log(Object.keys(graph));
// [
// 'thi.ng/egf',
// 'thi.ng/umbrella',
// 'toxi',
// 'toxi@twitter',
// 'postspectacular@gh'
// ]
console.log(graph.toxi);
// {
// '$id': 'toxi',
// type: 'person',
// name: 'Karsten Schmidt',
// location: 'London',
// account: [
// {
// '$ref': 'toxi@twitter',
// deref: [Function: deref],
// equiv: [Function: equiv]
// },
// {
// '$ref': 'postspectacular@gh',
// deref: [Function: deref],
// equiv: [Function: equiv]
// }
// ]
// }
// in this example inlining of referenced nodes is disabled (default)
// therefore refs are encoded as objects implementing the `IDeref` interface
// to obtain the referenced node
console.log(graph.toxi.account[0].deref());
// {
// '$id': 'toxi@twitter',
// type: 'account',
// name: '@toxi',
// url: 'http://twitter.com/toxi'
// }
EGF is a UTF-8 plain text format and largely line based, though supports multi-line values. An EGF file consists of node definitions, each with zero or more properties and their (optionally tagged) values. EGF does not prescribe any other schema or structure and it's entirely up to the user to e.g. allow properties themselves to be defined as nodes with their own properties, thus allowing the definition of LPG (Labeled Property Graph) topologies as well.
; Comment line
; First node definition
node1
; property with string value
prop1 value
; property with reference to another node
prop2 -> node2
; property with tagged value
prop3 #tag value
prop4 <<< long, potentially
multiline
value >>>
prop5 #tag <<< tagged multi-line value >>>
node2
; property comment
prop1 value
...
A full grammar definition is forthcoming. In the meantime, please see a somewhat outdated older version and related comments in #234 for more details.
Properties with reference values to another node constitute edges in the graph.
References are encoded via property -> nodeid
.
The following graph defines two nodes with circular references between them.
Each node has a literal (string, by default) property name
and a reference
property knows
to another node (via its ID). The order of references is
arbitrary and the parser will automatically produce forward declarations for
nodes not yet known.
alice
name Alice
knows -> bob
bob
name Robert
knows -> alice
Using default parser options, this produces an object as follows. Note, the
references are encoded as objects with a $ref
property and implement the
IDeref
and IEquiv
interfaces defined in the
@thi.ng/api
package.
{
alice: {
'$id': 'alice',
name: 'Alice',
knows: {
'$ref': 'bob',
deref: [Function: deref],
equiv: [Function: equiv]
}
},
bob: {
'$id': 'bob',
name: 'Robert',
knows: {
'$ref': 'alice',
deref: [Function: deref],
equiv: [Function: equiv]
}
}
}
// access bob's name via alice
graph.alice.knows.deref().name
// "Robert"
If node resolution is enabled (via the resolve
option) in the parser, the
referenced nodes will be inlined directly and produce circular references in the
JS result object. In many cases this more desirable and fine, however will stop
the graph from being serializable to JSON (for example).
{
alice: <ref *1> {
'$id': 'alice',
name: 'Alice',
knows: { '$id': 'bob', name: 'Robert', knows: [Circular *1] }
},
bob: <ref *2> {
'$id': 'bob',
name: 'Robert',
knows: <ref *1> {
'$id': 'alice',
name: 'Alice',
knows: [Circular *2]
}
}
}
To enable namespacing and simplify re-use of existing data vocabularies, we're
borrowing from existing Linked Data formats & tooling to allow node and property
IDs to be defined in a prefix:name
format alongside @prefix
declarations.
Such prefix IDs will be expanded during parsing and usually form complete URIs,
but could expand to any string. The various (50+) commonly used Linked Data
vocabulary prefixes bundled in
@thi.ng/prefixes
are available by default, though can be overridden, of course...
; prefix declaration
@prefix thi: http://thi.ng/
thi:toxi
rdf:type -> foaf:person
Result:
{
'thi.ng/toxi': {
'$id': 'thi.ng/toxi',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': {
'$id': 'http://xmlns.com/foaf/0.1/person'
}
},
'http://xmlns.com/foaf/0.1/person': {
'$id': 'http://xmlns.com/foaf/0.1/person'
}
}
Currently in NodeJS only, external graph definitions can be included in the main
graph via the @include
directive. Any @prefix
declarations in the included
file will only be available in that file, however will inherit any pre-existing
prefixes declared in the main file.
Relative file paths will be relative to the path of the currently processed file:
|- include
| |- sub1.egf
| |- sub2.egf
|- main.egf
(These examples make use of the schema.org ontology)
; main.egf
; declare an empty prefix
@prefix : http://thi.ng/
@include include/sub1.egf
; use empty prefix for this node
:toxi
rdf:type -> schema:Person
; sub1.egf
@include sub2.egf
:sub1.egf
rdf:type -> schema:Dataset
schema:dateCreated #date 2020-07-19
; sub2.egf
:sub2.egf
rdf:type -> schema:Dataset
schema:creator -> :toxi
Parsing the main.egf
file (with node resolution/inlining and pruning) produces:
{
'http://thi.ng/sub2.egf': {
'$id': 'http://thi.ng/sub2.egf',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': { '$id': 'http://schema.org/Dataset' },
'http://schema.org/creator': {
'$id': 'http://thi.ng/toxi',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': { '$id': 'http://schema.org/Person' }
}
},
'http://thi.ng/toxi': {
'$id': 'http://thi.ng/toxi',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': { '$id': 'http://schema.org/Person' }
},
'http://thi.ng/sub1.egf': {
'$id': 'http://thi.ng/sub1.egf',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type': { '$id': 'http://schema.org/Dataset' },
'http://schema.org/dateCreated': 2020-07-19T00:00:00.000Z
}
}
Complying JS objects can be converted to EGF using the toEGF()
function. This
function takes an iterable of
Node
objects, optional prefix mappings and an optional property serialization
function to deal with custom tagged values. The default property formatter
(toEGFProp()
) handles various values for built-in tags and can be used in
combination with any additional user provided logic.
import { rdf, schema } from "@thi.ng/prefixes";
const res = toEGF([
{
$id: "thi:egf",
"rdf:type": { $ref: "schema:SoftwareSourceCode" },
"schema:isPartOf": { $id: "http://thi.ng/umbrella" },
"schema:dateCreated": new Date("2020-02-16")
},
{
$id: "thi:umbrella",
"rdf:type": { $ref: "schema:SoftwareSourceCode" },
"schema:programmingLanguage": "TypeScript"
}
],
// prefix mappings (optional)
{
thi: "http://thi.ng/",
schema,
rdf
}
// property serializer (optional)
toEGFProp
);
@prefix thi: http://thi.ng/
@prefix schema: http://schema.org/
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
thi:egf
rdf:type -> schema:SoftwareSourceCode
schema:isPartOf -> thi:umbrella
schema:dateCreated #date 2020-02-16T00:00:00.000Z
thi:umbrella
rdf:type -> schema:SoftwareSourceCode
schema:programmingLanguage TypeScript
Karsten Schmidt
If this project contributes to an academic publication, please cite it as:
@misc{thing-egf,
title = "@thi.ng/egf",
author = "Karsten Schmidt",
note = "https://thi.ng/egf",
year = 2020
}
© 2020 - 2021 Karsten Schmidt // Apache Software License 2.0