An insane fast Javascript toolchain.
WIP
Kataw is a JavaScript toolchain that aim to unify functionality that has previously been separate tools. It features everything from low-level CST manipulation to tools like linting, code analyzes, transform, and minification.
- CST nodes
- CST keywords
- ESNext
- Diagnostics
- Printing
- Linting
- Transformation
- Types
- Comments
- CST parser features
- Current state
- Roadmap
- Future
The toolchain's core is based upon a ECMAScript friendly CST that allows you to parse ECMAScriptยฎ 2022 (ECMA-262 12th Edition) language specification
.
If the only goal is to perform syntactic analysis (parsing) of a Javascript program, you can do this with either kataw.parseModule
or kataw.parseScript
.
Noted that with
ES2015
and later a Javascript program can be either a script or a module.
Here is an example on how to set up Kataw
to act like for example Acorn
:
// Parse with module goal
kataw.parseModule('x = y', { next: true }, function(source, kind, msg, line, column) {
throw msg + '(' + line + ', ' + column + ')';
});
// Parse in script mode
kataw.parseScript('x = y', { next: true }, function(source, kind, msg, line, column) {
throw msg + '(' + line + ', ' + column + ')';
});
The returned CST tree can now be used as an AST.
Note that the CST contains more information that can be extracted from the CST node's through public API methods.
Many of these APIs have the advantage that they allow you to "retrieve" info that is not otherwise available with a standard AST parser.
One example is that you only need to use kataw.isStatementNode
to find out if the current CST node is a statement node. With an AST parser you must use
a switch statement
with 60 switch cases
.
// With Babel you are forced to do
switch(node.type) {
case 'SwitchStatement': ...
case 'ReturnStatement': ...
}
// With Kataw
kataw.isStatementNode(node); // return 'true'
A second benefit with this CST parser is that it is running in recovery mode
by default
and can be used in any editor. A build-in diagnostic system reports diagnostics if an error handler
have been used. The diagnostics are dynamic. It means all the diagnostics are informative, and they will change based on the context you
are parsing in.
These features used together gives you more options to adjust, modify and customize the CST tree compared to a regular AST parser and you can also write fewer code lines and at the same time experience insane performance.
All CST nodes has a kind
which is a number that represents the node type. It's identical to ESTree
type with the exception that Kataw doesn't do any
string comparisons - everything in Kataw is a number.
Here is an example:
if (node.kind === Kataw.SyntaxKind.Identifier) {}
You need to use kataw.visitEachChild
to traverse the CST tree to get access to each CST node. After that you do any kind of transformation.
Be aware that also the kind
contain some additional information that you can extract through the public API - not only the NodeFlags
.
For example Kataw.isKeyword
, Kataw.isIdentifier
, and Kataw.isFutureReserved
.
This is made possible because there are no token
in Kataw. Everything is
a SyntaxKind
- token
and kind
merged into one.
Kataw also exports all CST nodes so you can create your own nodes. This is handy if you want to try out new ECMA
features that isn't part of the language yet, or make your own transformers as in Babel
.
Here is an example on how to create an CST node:
// creates an identifier
kataw.createIdentifier(/* text */ 'hello', /* rawText */ 'hello', /* start */ 1, /* end */ 5)
Some CST nodes needes additional info. This can be set using the Kataw.NodeFlags
andt this bitwise mask can be set on every CST node and CST keyword node.
// creates an string literal
const str = kataw.createStringLiteral(
/* text */ 'hello', /* rawText */ 'hello', /* start */ 1, /* end */ 5
);
// set the flag and mark it as a single quote. E.g. 'string'
str.flag |= Kataw.NodeFlags.SingleQuote.
// Check if the flag is set
kataw.isSingleQuote(str); // true
All keywords in Kataw is it's own CST node, and you create them in almost the same way as any other CST nodes.
kataw.createToken(kataw.SyntaxKind.ForKeyword, Kataw.NodeFlags.NoChildren, /* start */ 1, /* end */ 5);
Diagnostics in Kataw can either be error
, warning
or lint failure
.
The diagnostics have been designed like this so you can quickly understand what the problem is and correct it.
Adding a error handler as the 3rd argument will enable diagnostics. The diagnostics are flexible and let you use them together with Kataw's own reporters or you can create your own reporter or whatever is your use case.
Here is how it works:
import { parseScript } from 'kataw';
parseScript('[x', { next: true }, function(diagnosticSource, kind, message, start, end) {
});
Param | Description |
---|---|
diagnosticSource |
Is either Lexer or Printer . |
kind |
Is either lint , error , warning |
message |
The diagnostic message |
start |
The start position of the diagnostics |
end |
The end position of the diagnostics |
Stage 3
proposals can be parsed if the next
options are enabled.
Stage 1
and stage 2
proposals are not supported because the specs drafts are changing all the time.
Kataw has it's own type system that is an improvement over Typescript
and Flow
, and it
conform to the ECMAScriptยฎ 2022 (ECMA-262 12th Edition) language specification
.
As everything else - it's developed for high performance and it consumes less memory.
It allows you to parse syntax like function x(y: string, z: number): string | number {}
and other
similiar syntax.
The type system is still WIP
and will be enabled by default in the CLI
together with
Kataw's own type checker.
You can manually enable this if you enable the allowTypes
option. It will then parse the types but it
will not do any type checking.
You can use kataw.removeKatawTypes
to remove Kataw's types from the CST tree
const source = kataw.parseModule('let: string', { allowTypes: true});
// Remove the types
kataw.removeKatawTypes(source);
Leading and trailing comments can be extracted at correct position with kataw.getLeadingComments
and kataw.getTrailingComments
.
Hello
/* I'm a comment */
there!
Getting the trailing comment of Hello
can be done like this kataw.getTrailingComments(5, 24).
It get the comments from the end value of
hello
until the start value of there!
.
If you want a 1:1
copy of the actual source code, you can do a "slice" from the start value of Hello
to the end value of there!
.
Rules still being added, but Kataw can go linting either through public API methods or options. Most of ESLint common or recommended rules also works for Kataw and you can either enable or disable them.
It can be done like this
import { lintScript } from 'kataw';
lintScript('eval()', { reporter: aladdin }, { noEval: true});
import { parseScript } from 'kataw';
parseScript('eval()', { noEval: true});
The DiagnosticKind
will be set to DiagnosticKind.Lint
and you can chose to ignore this and treat the diagnostic as any other error, or
for example create your own reporter.
Kataw
can act the same way asBabel
and be a tool that helps you write code in the latest version of Javascript. This can be done with
developing transformers to handle situations where your supported environments don't support certain features natively.
The compiler transform those features down to a supported version.
You have to use kataw.visitEachChild
to traverse the CST tree. kataw.visitNode
can be used to traverse a single node, and
kataw.visitNodes
to visit an array of CST nodes. This API method should only be used on lists. CST nodes that is known
to contain an array. There are no need to use for example Array.Array
to verify if it's an array.
Performance is maintained that way.
All CST nodes will be updated automatically if any changes has been detected.
Keywords can also be swapped around and the same with AssignmentExpression
, BinaryExpression
, UnaryExpression
and
UpdateExpression
operands. For example !==
can be changed to ===
.
A WithStatement
can be transformed into a WhileStatement
simply by changing the value of the TokenNode
.
The location of the CST node in the CST tree can also be changed if you change the values of start
and end
on the CST node.
Changing the NodeFlags
allow you to change how the CST node should behave.
All this things gives a you better control over transformation of each CST node compared to Babel
and Rome
.
Here is an example on an simple transformer that will replace all identifiers with an NumericLiteral
.
export function swapIdentifierWithNumeric(transform) {
return transformSourceFile;
function transformSourceFile(root) {
switch (node.kind) {
case kataw.NodeKind.Identifier:
return kataw.createNumericLiteral(
123,
"123",
kataw.NodeFlags.ExpressionNode | kataw.NodeFlags.NoChildren,
/* start */ 1,
/* end */ 3
);
default:
return kataw.visitEachChild(transform, root, visitor);
}
}
function visitor() {
switch (node.kind) {
default:
return kataw.visitEachChild(transform, node, visitor);
}
}
}
Kataw is adjustable and allows three different ways to print your source code.
The returned source does not include any extra parenthesis or unnecessary code.
The comments are 100% correct and they will be printed in the places you expect.
API | Description |
---|---|
print |
Prints a given CST tree and let you adjust the diagnostics and set your own parser options |
printModule |
Prints the source in module goal |
printScript |
Prints the source in script mode |
Here is an example:
// Print
kataw.print(kataw.parseModule('x = y', { next: true }, function(source, kind, msg, line, column) {
throw msg + '(' + line + ', ' + column + ')';
}));
// Print with module goal
kataw.printModule('x = y');
// Print in script mode
kataw.printScript('x = y');
Statements, blocks and other code lines can be ignored in Kataw with a // kataw-ignore
comment.
If set on a WhileStatement
it will ignore the entire statement and the BlockStatement
.
// kataw-ignore
while (true) {}
You can use kataw.shouldIgnoreNextNode(node);
to verify if the node should be ignored.
-
Error recovery by default (like Acorn loose), but it reconstruct the CST tree correctly
-
Optional error reporting (require a callback as the parsers 3rd argument)
-
Dynamic error, hint and warning diagnostics (depends on the context you are parsing in)
-
Public API methods to extract info from the CST nodes
-
100% correct comment extraction and attachment algorithm
-
Can parse types and type annotations (Kataw has it's own type system)
-
Can be used in any editors
-
Scalable
-
Performance
- The CST parser can be used in production
- Parsing ECMA 262(aka JavaScript), and the cst spec be stable
- Test 262 passes
- Printing API (like prettier API)
- //kataw-ignore(like //prettier-ignore)
- Command line interface (like prettier cli)
- Documentation & website
- plugin system, to make it possible to support jsx/ts/flow...
- jsx plugin
- ts plugin
- transformers: like babel
- minify: like uglify-js
- linter: like eslint
-
A "hook system" for adding additional rules for the linter and the grammar checker will be published.
-
Hooks to support experimental syntax and ECMA proposals in an sandboxed envirnonment