Shared Parser Library

So.... I've been trying to make rustc/rust-analyzer shared parser library for the past four years (rust-analyzer originally intended to be just a parser library), and the results have been meagre -- we share the lexer, and that's it. My theory is that's due to org stuff -- parser/AST has wide APIs, so extracing that is a whole lot of poorly factorable work. As such, other, more immediate things tend to always get higher priority. But today rust-analyzer feels like it is on a relatively stable footing, so it seems like a good opportunity to try to move the giant ship for real. 

Let's see what we need to do to achieve that: 

1. Have isolated, IDE friendly Rust parsing library.
  * separate repo from ra? Or at least `/libs` folder?
  * get rid of Source Sync traits, use more direct API
2. Figure out the best way to integrate with rustc.
  * Tree -> Tree transformation (there was PR to rustc proving feasibility)
    * need to stabilize & finalize rowan for that
    * perf?
  * Parser -> (Tree1, Tree2)
    * how to emit typed ast out of untyped parser?
      * ungrammar
  * Concede that sharing "nice" library is infeasible, and just hack today's parser to emit CST via `cfg` flags
3. Do cleanups on rustc side.
  * harmonize token tree model (always use split tokens)
  * reduce dependencies on global state (remove code-map from parser, allow for shared-nothing parallel parsing)
  * how to handle Interpolated tokens?
4. Implement the merge
  * ??? and lots of work


Tasks: 

- [x] switch from TokenSource to SOA tokens #10995
- [x] move text-based lexing to the parser as well.
- [x] switch from TreeSink to emiting a vec of events. 
- [x] move trivia attachment logic to the parser
- [x] move tests to parser
  - [x] move lexer tests
  - [x] move parser tests
  - [x] add dedicated tests for ws attachment
- [x] get rid of `synthetic_root`
- [x] remove `parse_text_as`
  - [x] split parse into "parse top level" and "parse prefix"
  - [x] remove extra argument from `build_tree`
- [x] audit TopEntryPoint / PrefixEntryPoint to make sure it doesn't have some leftovers
  - [x] add tests for non-main prefix entry points
  - [x] add tests for non-main top entry points
- [x] figure out invariant for `parse` -- it doesn't parse the whole file. Four cases:
  * Parse the whole input as SourceFile -- working as intended today
  * Parse prefix of the input as `$expr`, for MBE -- sorta-working
  * Parse the whole input as `$expr`, for MBE output -- broken, primarily because it isn't separated from the previous point
  * Parse ??? as a template for SSR -- was never properly considered as a design goal, creates a lot of paints for the interface. 
- [ ] prototype structured AST creation from structured tokens
- [ ] fix FIXMEs in prefix entry point tests
  - [ ] ensure there are macro-expansion level tests
- [ ] add `ast::EmptyStmt`
- [ ] move to libs dir
- [ ] pick name (`robust_parser`? rust is substring. sisyphus also is a fitting name)
- [ ] publish to crates.io
- [ ] document guidelines (data based interface, simple, not necessary minimal (hooks for recovery, etc), flexible).
- [ ] figure out the story for incremental parsing
  - [ ] move incremental parsing to parser crate
- [ ] figure out the story for macros
  * rustc-style tree captures via `$expr` 
  * parsing prefixes without creating tokens for everything
  * TokenMap which works
- [ ] structured lexer errors.
- [ ] drop `limit` dependency. 
- [ ] restore `{}` invariant
  - [ ] strengthen the invariant to cover all kinds of parenthesis for macros? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared Parser Library #10765

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development