Skip to content

Shared Parser LibraryΒ #10765

Open
Open
@matklad

Description

So.... I've been trying to make rustc/rust-analyzer shared parser library for the past four years (rust-analyzer originally intended to be just a parser library), and the results have been meagre -- we share the lexer, and that's it. My theory is that's due to org stuff -- parser/AST has wide APIs, so extracing that is a whole lot of poorly factorable work. As such, other, more immediate things tend to always get higher priority. But today rust-analyzer feels like it is on a relatively stable footing, so it seems like a good opportunity to try to move the giant ship for real.

Let's see what we need to do to achieve that:

  1. Have isolated, IDE friendly Rust parsing library.
  • separate repo from ra? Or at least /libs folder?
  • get rid of Source Sync traits, use more direct API
  1. Figure out the best way to integrate with rustc.
  • Tree -> Tree transformation (there was PR to rustc proving feasibility)
    • need to stabilize & finalize rowan for that
    • perf?
  • Parser -> (Tree1, Tree2)
    • how to emit typed ast out of untyped parser?
      • ungrammar
  • Concede that sharing "nice" library is infeasible, and just hack today's parser to emit CST via cfg flags
  1. Do cleanups on rustc side.
  • harmonize token tree model (always use split tokens)
  • reduce dependencies on global state (remove code-map from parser, allow for shared-nothing parallel parsing)
  • how to handle Interpolated tokens?
  1. Implement the merge
  • ??? and lots of work

Tasks:

  • switch from TokenSource to SOA tokens internal: switch from trait-based TokenSource to simple struct of arraysΒ #10995
  • move text-based lexing to the parser as well.
  • switch from TreeSink to emiting a vec of events.
  • move trivia attachment logic to the parser
  • move tests to parser
    • move lexer tests
    • move parser tests
    • add dedicated tests for ws attachment
  • get rid of synthetic_root
  • remove parse_text_as
    • split parse into "parse top level" and "parse prefix"
    • remove extra argument from build_tree
  • audit TopEntryPoint / PrefixEntryPoint to make sure it doesn't have some leftovers
    • add tests for non-main prefix entry points
    • add tests for non-main top entry points
  • figure out invariant for parse -- it doesn't parse the whole file. Four cases:
    • Parse the whole input as SourceFile -- working as intended today
    • Parse prefix of the input as $expr, for MBE -- sorta-working
    • Parse the whole input as $expr, for MBE output -- broken, primarily because it isn't separated from the previous point
    • Parse ??? as a template for SSR -- was never properly considered as a design goal, creates a lot of paints for the interface.
  • prototype structured AST creation from structured tokens
  • fix FIXMEs in prefix entry point tests
    • ensure there are macro-expansion level tests
  • add ast::EmptyStmt
  • move to libs dir
  • pick name (robust_parser? rust is substring. sisyphus also is a fitting name)
  • publish to crates.io
  • document guidelines (data based interface, simple, not necessary minimal (hooks for recovery, etc), flexible).
  • figure out the story for incremental parsing
    • move incremental parsing to parser crate
  • figure out the story for macros
    • rustc-style tree captures via $expr
    • parsing prefixes without creating tokens for everything
    • TokenMap which works
  • structured lexer errors.
  • drop limit dependency.
  • restore {} invariant
    • strengthen the invariant to cover all kinds of parenthesis for macros?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-parserparser issuesC-ArchitectureBig architectural things which we need to figure up-front (or suggestions for rewrites :0) )E-hardS-unactionableIssue requires feedback, design decisions or is blocked on other workfunA technically challenging issue with high impact

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions