This is super-alpha.
PEGs (Parsing Expression Grammars) are more powerful than regexes, compose better, and are expressible using PODS (Plain Ol' Data Structures)
Deterministic parsers are a simpler model than those that produce ambiguity.
pex
is implemented with a Virtual Machine, just like its inspiration LPEG.
Grammars are input as a quoted datastructure, just like Datomic queries.
(def Number '{number [digits (? fractional) (? exponent)]
fractional ["." digits]
exponent ["e" (? (/ "+" "-")) digits]
digits [(class num) (* (class num))]})
The left hand side of the map is the name of the rule, the right hand side is the definition. Any bare symbol inside a definition is a call to that rule. Calls are not applied with parentheses. Parentheses denote some special behavior.
Grammars are then compiled, like a java.util.regex.Pattern. The compiled grammar can then be run upon inputs.
String and chars literals match... literally
"foo"
Ordered Choice is the most important operation in a PEG. Rule B
will only be attempted only if A
fails:
(/ A B)
NB that A
cannot be a prefix of B
if you want B
to ever match:
;; invalid, foo will always win over foobar
(/ "foo" "foobar")
Vectors denote sequencing rules together. If you want A
B
& C
to succeed sequentially:
[A B C]
TODO
capture
places the region matched by the rule on the Value Stack
(capture integer (? fractional) (? exponent))
class
refers symbolically to a matcher for a particular character class.
["42" (class alpha)]
There are several helpers that build up character classes. Each character class must be passed into pex/compile
as a matcher. TODO
Elaborate
?
is an optional rule:
(? b)
*
is repetition, 0 or more times.
(* foo)
The typical way to match separator delimited things:
[pattern (* separator pattern)]
action
refers to a parse action, immediately invoking it.
(action make-integer)
Actions can manipulate the Value Stack by reducing over items captured, updating the last item captured, or push a value.
There are also a few pre-built actions that access an efficient StringBuffer for mutation while building up Strings.
EOI
means end of input. This only matches when input is exhausted, not when you're done parsing.
User supplied macros can expand rules to remove boilerplate.
TODO
example
JSON parser