This folder contains experiments used to guide Element development. Currently the focus is on an Element interpreter, written in Python to prioritize quick iteration over run-time performance.
The Ast
class defines an abstract syntax tree (AST) node and its pretty-printer.
An Ast node is a (required) name and (optional) children.
These are accessible by field names (Ast.name
and Ast.children
) or by tuple index (x.name
is x[0]
).
This duality enables pattern matching Ast nodes as tuples.
This module breaks an input string into a list of values, where each value is either a string or a tuple of strings. When the scanner encounters strand notation, it groups all the elements of the strand into a list of strings:
2+3
⇒['2', '+', '3']
1 2+3 4
⇒[['1', '2'], '+', ['3', '4']]
2+`a`b``c
⇒['2', '+', ['`a', '`b', '`', '`c']]
"a" "hello""""world"
⇒['"a"', '"hello"', '""', '"world"']
Numbers separated by spaces form strands, but spaces are optional in symbol or string strands.
A backtick (`
) not followed by a letter or double-quote is equivalent to `""
.
Stranding two strings without spaces as in “hi”“world” results in the list ["hi", "world"]
.
Strands are lightweight and optional syntax to create 1-dimensional lists.
You can create a 1-d list without stranding with the “join” primitive function: 1,2
⇒ ['1', '2']
, while nested lists always require parentheses as in (1;(2;3);4)
.
This parser is a bottom-up style parser originally inspired by the Double-E method. After a while I removed the operator precedence logic from this parser since K operators have uniform precedence.
A large portion of this parser now deals with juxtaposition, projection, and composition within the context of the usual application of values to operators, function calls, and parentheses.
- juxtaposition (implied application)
a b c
⇒(a (b c))
()()
⇒((lst NIL) (lst NIL))
f[x][y]
⇒((f (prg x)) (prg y))
a 2-b 3
⇒(a (- 2 (b 3)))
- projection (partial application)
2+
⇒(prj + 2)
"a",
⇒(prj , "a")
- composition
+-
⇒(cmp + (prj -))
(note-
treated as a partial application with both arguments missing)a+*b-
⇒(cmp (prj + a) (cmp * (prj - b)))
- infix and prefix application
a+*b-1
⇒(+ a (* (- b 1)))
+-^a
⇒(+ (- (^ a)))
The job of the parser is to disambiguate the number of arguments to operators.
For example +2*b-
contains 3 operators (“+”, “*”, and “-“), each of which has either a unary/prefix meaning or a binary/infix meaning depending on context.
However, the above results in the AST (cmp + (cmp (prj * 2) (prj - b)))
because the “rule” is that a trailing operator is infix, and so we can recognize the suffix b-
as a projection (prj - b)
.
From there we work backward and see *
, which is infix because it has a left operand 2
, resulting in another projection (prj * 2)
.
The goal of the parser in this situation is to preserve the fact that -
is missing an argument, so any time a prj
node is composed with other operators, the parent node becomes a composition.
These two projections form a composition, and finally the leading +
must be a prefix operator because it has no left argument.
Again we preserve the partial application information by making the root node a cmp
.
One thing the parser does not do is validate the shape or type of arguments to operands.
For example "ab"/3
is an invalid expression in the language but parses as ((fld "ab") 3)
.
The evaluator will handle this task.
This section is TBD.
- Interesting paper: https://arxiv.org/pdf/1109.0781
- Interesting statically typed compiled language: https://aardappel.github.io/lobster/README_FIRST.html
This file aims to include unit tests for the major parts of the prototype. Currently it has about 100 parser tests comparing a string input with expected Ast output.
This section is TBD but here are some potentially useful links: