Skip to content

kataw/kataw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Kataw

An insane fast Javascript toolchain.

Kataw NPM GitHub license Meriyah NPM


WIP

Kataw is a JavaScript toolchain that aim to unify functionality that has previously been separate tools. It features everything from low-level CST manipulation to tools like linting, code analyzes, transform, and minification.

The toolchain's core is based upon a ECMAScript friendly CST that allows you to parse ECMAScriptยฎ 2022 (ECMA-262 12th Edition) language specification.

If the only goal is to perform syntactic analysis (parsing) of a Javascript program, you can do this with either kataw.parseModule or kataw.parseScript.

Noted that with ES2015 and later a Javascript program can be either a script or a module.

Here is an example on how to set up Kataw to act like for example Acorn:

 // Parse with module goal
 kataw.parseModule('x = y', { next: true }, function(source, kind, msg, line, column) {
    throw msg + '(' + line + ', ' + column + ')';
 });

 // Parse in script mode
 kataw.parseScript('x = y', { next: true }, function(source, kind, msg, line, column) {
    throw msg + '(' + line + ', ' + column + ')';
 });

The returned CST tree can now be used as an AST.

Note that the CST contains more information that can be extracted from the CST node's through public API methods.

Many of these APIs have the advantage that they allow you to "retrieve" info that is not otherwise available with a standard AST parser.

One example is that you only need to use kataw.isStatementNode to find out if the current CST node is a statement node. With an AST parser you must use a switch statement with 60 switch cases.

 // With Babel you are forced to do
 switch(node.type) {
   case 'SwitchStatement': ...
   case 'ReturnStatement': ...
 }

 // With Kataw
 kataw.isStatementNode(node); // return 'true'

A second benefit with this CST parser is that it is running in recovery mode by default and can be used in any editor. A build-in diagnostic system reports diagnostics if an error handler have been used. The diagnostics are dynamic. It means all the diagnostics are informative, and they will change based on the context you are parsing in.

These features used together gives you more options to adjust, modify and customize the CST tree compared to a regular AST parser and you can also write fewer code lines and at the same time experience insane performance.

CST nodes

All CST nodes has a kind which is a number that represents the node type. It's identical to ESTree type with the exception that Kataw doesn't do any string comparisons - everything in Kataw is a number.

Here is an example:

if (node.kind === Kataw.SyntaxKind.Identifier) {}

You need to use kataw.visitEachChild to traverse the CST tree to get access to each CST node. After that you do any kind of transformation.

Be aware that also the kind contain some additional information that you can extract through the public API - not only the NodeFlags.

For example Kataw.isKeyword, Kataw.isIdentifier, and Kataw.isFutureReserved.

This is made possible because there are no token in Kataw. Everything is a SyntaxKind - token and kind merged into one.

Kataw also exports all CST nodes so you can create your own nodes. This is handy if you want to try out new ECMA features that isn't part of the language yet, or make your own transformers as in Babel.

Here is an example on how to create an CST node:

 // creates an identifier
 kataw.createIdentifier(/* text */ 'hello', /* rawText */ 'hello', /* start */ 1,  /* end */ 5)

Some CST nodes needes additional info. This can be set using the Kataw.NodeFlags andt this bitwise mask can be set on every CST node and CST keyword node.

 // creates an string literal
 const str = kataw.createStringLiteral(
    /* text */ 'hello', /* rawText */ 'hello', /* start */ 1,  /* end */ 5
);

 // set the flag and mark it as a single quote. E.g. 'string'
 str.flag |= Kataw.NodeFlags.SingleQuote.

 // Check if the flag is set
 kataw.isSingleQuote(str); // true

CST keywords

All keywords in Kataw is it's own CST node, and you create them in almost the same way as any other CST nodes.

kataw.createToken(kataw.SyntaxKind.ForKeyword, Kataw.NodeFlags.NoChildren, /* start */ 1,  /* end */ 5);

Diagnostics

Diagnostics in Kataw can either be error, warning or lint failure.

The diagnostics have been designed like this so you can quickly understand what the problem is and correct it.

Adding a error handler as the 3rd argument will enable diagnostics. The diagnostics are flexible and let you use them together with Kataw's own reporters or you can create your own reporter or whatever is your use case.

Here is how it works:

import { parseScript } from 'kataw';

parseScript('[x', { next: true }, function(diagnosticSource, kind, message, start, end) {
});

Diagnostic arguments

Param Description
diagnosticSource Is either Lexer or Printer.
kind Is either lint, error, warning
message The diagnostic message
start The start position of the diagnostics
end The end position of the diagnostics

ESNext

Stage 3 proposals can be parsed if the next options are enabled.

Stage 1 and stage 2 proposals are not supported because the specs drafts are changing all the time.

Types

Kataw has it's own type system that is an improvement over Typescript and Flow, and it conform to the ECMAScriptยฎ 2022 (ECMA-262 12th Edition) language specification.

As everything else - it's developed for high performance and it consumes less memory.

It allows you to parse syntax like function x(y: string, z: number): string | number {} and other similiar syntax.

The type system is still WIP and will be enabled by default in the CLI together with Kataw's own type checker.

You can manually enable this if you enable the allowTypes option. It will then parse the types but it will not do any type checking.

You can use kataw.removeKatawTypes to remove Kataw's types from the CST tree

const source = kataw.parseModule('let: string', { allowTypes: true});
// Remove the types
kataw.removeKatawTypes(source);

Comments

Leading and trailing comments can be extracted at correct position with kataw.getLeadingComments and kataw.getTrailingComments.

Hello
/* I'm a comment */
  there!

Getting the trailing comment of Hello can be done like this kataw.getTrailingComments(5, 24). It get the comments from the end value of hello until the start value of there!.

If you want a 1:1 copy of the actual source code, you can do a "slice" from the start value of Hello to the end value of there!.

Linting

Rules still being added, but Kataw can go linting either through public API methods or options. Most of ESLint common or recommended rules also works for Kataw and you can either enable or disable them.

Linting with public API

It can be done like this

import { lintScript } from 'kataw';

lintScript('eval()', { reporter: aladdin }, { noEval: true});

Linting with parser options

import { parseScript } from 'kataw';

parseScript('eval()', { noEval: true});

The DiagnosticKind will be set to DiagnosticKind.Lint and you can chose to ignore this and treat the diagnostic as any other error, or for example create your own reporter.

Transformation

Kataw can act the same way asBabel and be a tool that helps you write code in the latest version of Javascript. This can be done with developing transformers to handle situations where your supported environments don't support certain features natively.

The compiler transform those features down to a supported version.

You have to use kataw.visitEachChild to traverse the CST tree. kataw.visitNodecan be used to traverse a single node, and kataw.visitNodes to visit an array of CST nodes. This API method should only be used on lists. CST nodes that is known to contain an array. There are no need to use for example Array.Array to verify if it's an array. Performance is maintained that way.

All CST nodes will be updated automatically if any changes has been detected.

Keywords can also be swapped around and the same with AssignmentExpression, BinaryExpression, UnaryExpression and UpdateExpression operands. For example !== can be changed to ===.

A WithStatement can be transformed into a WhileStatement simply by changing the value of the TokenNode.

The location of the CST node in the CST tree can also be changed if you change the values of start and end on the CST node.

Changing the NodeFlags allow you to change how the CST node should behave.

All this things gives a you better control over transformation of each CST node compared to Babel and Rome.

Here is an example on an simple transformer that will replace all identifiers with an NumericLiteral.

export function swapIdentifierWithNumeric(transform) {
  return transformSourceFile;

  function transformSourceFile(root) {
    switch (node.kind) {
      case kataw.NodeKind.Identifier:
        return kataw.createNumericLiteral(
          123,
          "123",
          kataw.NodeFlags.ExpressionNode | kataw.NodeFlags.NoChildren,
          /* start */ 1,
          /* end */ 3
        );
      default:
        return kataw.visitEachChild(transform, root, visitor);
    }
  }

  function visitor() {
    switch (node.kind) {
      default:
        return kataw.visitEachChild(transform, node, visitor);
    }
  }
}

Printing

Kataw is adjustable and allows three different ways to print your source code.

The returned source does not include any extra parenthesis or unnecessary code.

The comments are 100% correct and they will be printed in the places you expect.

API Description
print Prints a given CST tree and let you adjust the diagnostics and set your own parser options
printModule Prints the source in module goal
printScript Prints the source in script mode

Here is an example:

// Print
 kataw.print(kataw.parseModule('x = y', { next: true }, function(source, kind, msg, line, column) {
    throw msg + '(' + line + ', ' + column + ')';
 }));

 // Print with module goal
 kataw.printModule('x = y');

 // Print in script mode
 kataw.printScript('x = y');

Ignore comment

Statements, blocks and other code lines can be ignored in Kataw with a // kataw-ignore comment.

If set on a WhileStatement it will ignore the entire statement and the BlockStatement.

// kataw-ignore
while (true) {}

You can use kataw.shouldIgnoreNextNode(node); to verify if the node should be ignored.

CST parser features

  • Error recovery by default (like Acorn loose), but it reconstruct the CST tree correctly

  • Optional error reporting (require a callback as the parsers 3rd argument)

  • Dynamic error, hint and warning diagnostics (depends on the context you are parsing in)

  • Public API methods to extract info from the CST nodes

  • 100% correct comment extraction and attachment algorithm

  • Can parse types and type annotations (Kataw has it's own type system)

  • Can be used in any editors

  • Scalable

  • Performance

Current state

  • The CST parser can be used in production

Roadmap

๐Ÿ“Œv0.1

  • Parsing ECMA 262(aka JavaScript), and the cst spec be stable
  • Test 262 passes
  • Printing API (like prettier API)
  • //kataw-ignore(like //prettier-ignore)
  • Command line interface (like prettier cli)
  • Documentation & website

v0.2

  • plugin system, to make it possible to support jsx/ts/flow...
  • jsx plugin
  • ts plugin

v0.3

  • transformers: like babel
  • minify: like uglify-js
  • linter: like eslint

v1.0

Future

  • A "hook system" for adding additional rules for the linter and the grammar checker will be published.

  • Hooks to support experimental syntax and ECMA proposals in an sandboxed envirnonment