A stack based programming language I designed based on Forth and Porth from the Porth programming language series of Tsoding Daily channel. The language is quite similar to Porth, however there are several differences. The compiler was written in Python first, but then rewrote it using Corth language itself. It is now a self hosted language.
The repo consists of the compiler source code in Corth, an already compiled and assembled compiler, standard library and examples.
- Right now, the compiler can only compile in ELF64 format.
- Compiler uses NASM which can be downloaded using a package manager or from its repo.
To compile a Corth program to an ELF64 executable, cd to the directory that contains the corth executable and run:
./corth compile <file-name> -i ./std/
This will create an executable called output which can be directly run.
The compiler can be bootstrapped using the bootstrap
subcommand:
./corth bootstrap ./compiler/ --std ./std/
This will compile the compiler source code and place it at ./corth.
- Corth is a concatinative (stack based) language. Every operation pops its arguments from the end of the stack and pushes its return values to the end of the stack.
- Programs are compiled into NASM format first, then into object and executable files.
- Indentation, spaces, newline characters are ignored by the compiler unless they are inside of a string.
- Names can contain any character except whitespace (space, tab or newline); but can not start with a decimal digit, a dollar sign, single or double quote.
- Language is made up of WORDS, instead of statements. Because of this, the lexer does not create a AST and instead only produces a sequence of tokens. These tokens are then run in order.
- EVERYTHING IN THIS LANGUAGE CAN BE CHANGED AT ANY TIME. USE WITH CAUTION.
// From ./examples/hello_world.corth
include "linux_86x/stdio.corth"
proc main
int int -> int
in let argc argv in
"Hello, World!\n" puts
end 0 end
- This is a simple program that prints 'Hello, World!' when run.
- Compile and run this program with
./corth compile ./examples/hello_world.corth -i ./std.corth && ./output
include
keyword is used to include the library stdio, which contains some basic I/O procedures and macros for writing to/from files and the standard streams.let
keyword is used to 'name' values. In this example, it is used to name the parameter values.proc
keyword is used to define a procedure.main
is where the program starts.- Writing anything in double quotes causes it to be interpreted as a string. A string is just a pointer to a character array and an integer that represents its length.
puts
is defined in the stdio library, which prints the string that is passed to it in the standard output.- For more examples, you can check the ./examples/ directory.
- For more information about these concepts, keep scrolling.
// Push and pop direction of the stack is kept on the right side in this document.
// This will hopefully help understand the concept of stack and 'let' keyword.
34 // Stack = { 34 }
0b101001 // Stack = { 34, 41 }
0o205126 // Stack = { 34, 41, 68182 }
0x5729da // Stack = { 34, 41, 68182, 5712346 }
'a' // Stack = { 34, 41, 68182, 5712346, 97 }
'\n' // Stack = { 34, 41, 68182, 5712346, 97, 10 }
- Numbers push an integer type to the stack. An integer type of stack item stores an 8 byte value.
- Lexer allows binary, octal and hexadecimal integers as well. The prefixes are
0b
,0o
and0x
respectively. - Putting a character between single quotes ('') causes it to be interpreted as an integer, which appends its ASCII value to the stack.
- Escape statements such as
\n
are supported in characters.
"This is a string" // Stack = { 0x648a15, 16 }
"This also is a string.
In Corth, multi-line strings are supported." // Stack = { 0x648a15, 16, 0x648a26, 71 }
- Strings push two items, a pointer to the address and an integer length of the string.
- Escape statements are also supported in strings.
- It is not recommended to modify the contents of strings in program, although it is possible.
34 35 + // Stack = { 69 }
69 27 - // Stack = { 69, 42 }
68 inc // Stack = { 69, 42, 69 }
43 dec // Stack = { 69, 42, 69, 42 }
3 23 * // Stack = { 69, 42, 69, 42, 69 }
85 2 / // Stack = { 69, 42, 69, 42, 69, 42 }
+
sums the last two items and pushes the result back,-
subtracts signed or unsigned integers.*
multiplies and/
divides signed integers. Right now, the compiler does not keep track of the integer 'signeded-ness', so every signed and unsigned operation can be used on any integer type. The unsigned versions of\*
and/
areu\*
andu/
.inc
anddec
are macros defined as1 +
and1 -
in core/arithmetic.corth. They can be used to increment or decrement a number once.
include "str.corth"
- When compiler sees an
include
keyword, it starts to compile the file whose path is given after theinclude
keyword. - Right now, the compiler does not allow including directories. A todo error will be given if tried.
- ./std/ directory contains some useful libraries like core/stack.corth or str.corth.
1 2 3 // Stack = { 1, 2, 3 }
swp // Stack = { 1, 3, 2 }
drop // Stack = { 1, 3 }
dup // Stack = { 1, 3, 3 }
rot // Stack = { 3, 3, 1 }
arot // Stack = { 1, 3, 3 }
- ./std/core/stack.corth contains several macros for stack manipulation.
drop
removes the last item from the stack.dup
duplicates the last item in the stack.swp
swaps the places of the last two items.rot
rotates the places of the last 3 items, by moving the first added to the last position.arot
does the exact opposite of whatrot
does.- These operations are macros defined using let, with names starting with underscore (_). For hard to manage stack operations, it is recommended to use
let
instead of these macros. - There are some others, but it is recommended to check ./std/core/stack.corth for more information since they are a bit more complex.
"Hello, world!\n" puts // Prints 'Hello, world!' to the standard output.
34 35 + eputu " is a nice number.\n" eputs // Prints '69 is a nice number.' to the standard error.
- ./std/linux_x86/stdio.corth contains useful procedures for I/O operations like reading from and writing to streams.
proc arithmetic-average // The name of the procedure is 'arithmetic-average'.
// Procedure takes two integers as arguments, and returns a single integer.
// The leftmost type is the oldest item in the stack.
int int -> int
in
// This is where the code is located.
+ 2 /
end
// This procedure will be run.
proc main
// Right now, only 'int int -> int' argument layout is allowed for the main procedure.
int int -> int
in let argc argv in
"The arithmetic average of 53 and 31 is " puts 53 31 arithmetic-average putu ".\n" puts
end 0 end // Program exits with exit code 0.
proc
defines a procedure, which can be called anywhere in the code.return
can be used to early return from a procedure, but the stack must match with the procedure's output layout.- Because of the way the stack works, procedures can return more than one value unlike most other languages.
- Program starts from the
main
procedure; if it is not defined, the compiler will return an error. - If the code does not require recursion and is simple, it might be better to use a macro depending on the exact requirements.
macro sayHi
// Name of the macro is 'sayHi'. This means when the compiler sees 'sayHi' anywhere in the code, it will convert it to these.
// Takes a string, the name and prints a welcome message.
"Hi, " puts puts "!\n" puts
endmacro
proc main
int int -> int
in let argc argv in
"Josh"
sayHi // This will be converted to this:
// "Hi, " puts puts "!\n" puts
end 0 end
macro
keyword is used to define macros andendmacro
is used to end the definition.- Macros expand at compile time, allowing simplifying and compressing code without losing functionality.
- Macros are only compiled after expanding, so any compile time error that would be caused by a macro is not detected before expansion.
- Using
let
inside a macro is usually a bad idea, although some library macros are designed that way (likedup
,swp
orrot
). If the code requireslet
; either change that macro to a procedure, or name the let variable with names that starts and ends with underscores (_).
-
Two global or two local variables can not have the same name, but if a local and a global variable have the same name, the local one will be reachable until it is removed from the scope.
-
If a name is defined globally (for example with
memory
), it can be reached from anywhere meaning any procedure in and out of the same file. -
If a name is defined locally in a procedure, it can only be reached within the scope that the statement it is in. For example:
// 'x' is undefined here. 69 let x in // 'x' is defined here. end // 'x' is undefined here.
// This is a line comment, it can also come after code
34 35 + putu // Just like that
/*
This is a block comment, aka a multi-line comment.
Block comments can span several lines.
*/
- Comments do not have any affect on the compiled program.
include "linux_x86/stdio.corth"
proc main
int int -> int
in let argc argv in
2 2 + 5 = if
"Well, math is broken. Nice.\n" puts // Hopefully won't be printed.
end
3 4 > if
"Your computer has virus\n" puts // Hopefully won't be printed.
else
"Your computer is alright\n" puts // Hopefully will be printed.
end
// Count from 0 to 9.
"First 10 numbers from 0 are,\n" puts
0 while dup 10 < do // Duplicate the number, and check if it is less then 10.
dup putu "\n" puts // Print the number.
inc end drop // Increase the number.
end 0 end
if
is used for conditions, it runs the code after it only if the last item on the stack is true.if
can be used withelse
for more functionality, which runs only if the bool is false.- Because of the way Corth works, the
if-else
statement can be used like a ternary operator in other languages. while
is used for loops, andbreak
can be used to quit the loop early.- If
if
is used without anelse
, the stack must not change betweenif
andend
since otherwise it will create an ambiguity of what the stack will be after that part. - If
if
is used with anelse
, the stack change betweenif
andelse
must be the same for the code betweenelse
andend
. - Stack must not change between
while
andend
. - Even though parser does not care for indentation, it is still a good idea to indent since in Corth these operations can get very hard to understand very quickly.
- Sadly, Corth does not have
else if/elif
statements at the moment.
// The rightmost variable collects the newest item in the stack. So x = 3 and y = 4.
// From now on, x will be replaced with 3 and y will be replaced with 4.
// If 'let' variables are compile-time constants, the variables will be directly replaced for optimization reasons.
// Otherwise, the stack values will be stored in local memory and loaded inside the structure.
3 4 let x y in
x y * x + y -
end
// Stack = { 10 }
let
keyword stores stack items in local memory. The stored values can not be modified but can be read later.- When any
let
variable is called, it directly returns its value; unlike amemory
variable which returns its address.
// The rightmost variable collects the newest item in the stack. So x = 3 and y = 4.
// From now on, x will be replaced with 3 and y will be replaced with 4.
3 4 peek x y in
x y * x + y -
end
// Stack = { 3, 4, 10 }
peek
works very similar tolet
, except it does not pop the items from the stack. This allows it to directly load them from the stack instead of using the local memory.- When any
peek
variable is called, it directly returns its value; just likelet
.
include "core/memory.corth"
// This is a global variable.
// The size must be a compile-time constant as memory is allocated in compile-time.
memory count sizeof(int) end
proc increase -> in
// Reads the value of 'count', adds one and writes back.
count @64 inc count !64
end
proc main
int int -> int
in let argc argv in
// Set the value of 'count' to 0.
0 count !64
// Increase the value of 'count'.
increase
// Print the value stored at 'count'. Should print '1'.
count @64 putu putnl
memory x sizeof(int) in
memory y sizeof(int) in // Size of the variable 'x' is equal to the size of an integer (8 bytes). Same with variable 'y'.
0 x !64
x @64 puti " is before saving 'x'\n" puts
420 x !64
x @64 puti " is after saving 'x'\n" puts
0 y !64
y @64 puti " is before saving 'y'\n" puts
69 y !64
y @64 puti " is after saving 'y'\n" puts
end end
end 0 end
memory
is used for allocating global or local memory in compile-time.- When a variable name declared by
memory
keyword is called; the address of the variable is returned. - When used globally,
memory
allocates memory that can be used from everywhere after definition. - When used locally,
memory
allocates memory that can be used only in betweenin
andend
. @64
loads 8 bytes of data from the address. (@8
,@16
and@32
are also allowed)!64
stores 8 bytes of data to the address.
include "dynamic/malloc.corth"
// Assume this is inside a procedure.
100 malloc let buffer in // Allocate 100 bytes of memory and save the object pointer as a constant named `buffer`.
// Loop through every byte in `buffer`.
buffer while dup buffer 100 + < do peek address in
0x67 address !8 // Set the byte to 0x67.
end inc end drop
buffer mfree drop // Deallocate the space.
end
- std/dynamic/ directory contains some dynamic memory management utilities, like
malloc
ordjoin
. Please check the library to learn more.
- std/collections/ directory contains different kinds of data structures. Most of these are terribly implemented though :(.