Open
Description
So.... I've been trying to make rustc/rust-analyzer shared parser library for the past four years (rust-analyzer originally intended to be just a parser library), and the results have been meagre -- we share the lexer, and that's it. My theory is that's due to org stuff -- parser/AST has wide APIs, so extracing that is a whole lot of poorly factorable work. As such, other, more immediate things tend to always get higher priority. But today rust-analyzer feels like it is on a relatively stable footing, so it seems like a good opportunity to try to move the giant ship for real.
Let's see what we need to do to achieve that:
- Have isolated, IDE friendly Rust parsing library.
- separate repo from ra? Or at least
/libs
folder? - get rid of Source Sync traits, use more direct API
- Figure out the best way to integrate with rustc.
- Tree -> Tree transformation (there was PR to rustc proving feasibility)
- need to stabilize & finalize rowan for that
- perf?
- Parser -> (Tree1, Tree2)
- how to emit typed ast out of untyped parser?
- ungrammar
- how to emit typed ast out of untyped parser?
- Concede that sharing "nice" library is infeasible, and just hack today's parser to emit CST via
cfg
flags
- Do cleanups on rustc side.
- harmonize token tree model (always use split tokens)
- reduce dependencies on global state (remove code-map from parser, allow for shared-nothing parallel parsing)
- how to handle Interpolated tokens?
- Implement the merge
- ??? and lots of work
Tasks:
- switch from TokenSource to SOA tokens internal: switch from trait-based TokenSource to simple struct of arraysΒ #10995
- move text-based lexing to the parser as well.
- switch from TreeSink to emiting a vec of events.
- move trivia attachment logic to the parser
- move tests to parser
- move lexer tests
- move parser tests
- add dedicated tests for ws attachment
- get rid of
synthetic_root
- remove
parse_text_as
- split parse into "parse top level" and "parse prefix"
- remove extra argument from
build_tree
- audit TopEntryPoint / PrefixEntryPoint to make sure it doesn't have some leftovers
- add tests for non-main prefix entry points
- add tests for non-main top entry points
- figure out invariant for
parse
-- it doesn't parse the whole file. Four cases:- Parse the whole input as SourceFile -- working as intended today
- Parse prefix of the input as
$expr
, for MBE -- sorta-working - Parse the whole input as
$expr
, for MBE output -- broken, primarily because it isn't separated from the previous point - Parse ??? as a template for SSR -- was never properly considered as a design goal, creates a lot of paints for the interface.
- prototype structured AST creation from structured tokens
- fix FIXMEs in prefix entry point tests
- ensure there are macro-expansion level tests
- add
ast::EmptyStmt
- move to libs dir
- pick name (
robust_parser
? rust is substring. sisyphus also is a fitting name) - publish to crates.io
- document guidelines (data based interface, simple, not necessary minimal (hooks for recovery, etc), flexible).
- figure out the story for incremental parsing
- move incremental parsing to parser crate
- figure out the story for macros
- rustc-style tree captures via
$expr
- parsing prefixes without creating tokens for everything
- TokenMap which works
- rustc-style tree captures via
- structured lexer errors.
- drop
limit
dependency. - restore
{}
invariant- strengthen the invariant to cover all kinds of parenthesis for macros?
Activity