Skip to content

MinusGix/tree-sitter-markdown

 
 

Repository files navigation

tree-sitter-markdown

A markdown parser for tree-sitter

For now this implements the CommonMark Spec. Maybe it will be extended to support Github flavored markdown

Structure

The parser is spit into two grammars. One for the block structure which can be found in /tree-sitter-markdown and one for the inline structure which is in /tree-sitter-markdown-inline. Because of this the entire document has to be scanned twice in order to be fully parsed. This is motivated by the parsing strategy section of the CommonMark Spec which suggests doing exactly this: Parsing the document twice, first determining the block structure and then parsing any inline content.

It also helps managing complexity, which was a problem with earlier versions of this parser, by allowing block and inline structure to be considered seperately. This was not the case as tree-sitters dynamic precedence can create hard to predict effects.

Usage

To use the two grammars, first parse the document with the block grammar. Then perform a second parse with the inline grammar using ts_parser_set_included_ranges to specify which parts are inline content. These parts are marked as inline nodes. Children of those inline nodes should be excluded from these ranges. For an example implementation see lib.rs in the bindings folder.

About

A markdown grammar for tree-sitter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 96.8%
  • C++ 1.7%
  • JavaScript 1.2%
  • Other 0.3%