Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping C and Core line numbers for Cerberus #6990

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vzaliva
Copy link
Contributor

@vzaliva vzaliva commented Oct 22, 2024

For the Cerberus and Cerberus-CHERI backends, we now display an intermediate language called Core instead of assembly. This patch adds line number information that maps some Core sub-expressions to their corresponding lines in the C code.

Copy link
Member

@mattgodbolt mattgodbolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, can you add some tests to try and prevent any bitrot on our side? Thanks!

lib/compilers/cerberus.ts Outdated Show resolved Hide resolved
@junlarsen junlarsen self-requested a review October 29, 2024 02:57
Copy link
Member

@junlarsen junlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally a little hesitant to bring in the entire dependency of tree-sitter for a single compiler. Maybe some other maintainers have comments? cc @OfekShilon @jeremy-rifkin @AbrilRBS

@jeremy-rifkin
Copy link
Member

I share your hesitancy. How hard would it be to extract the relevant info without tree-sitter? We do this for most compilers, though ad-hoc parsers have their downsides.

@vzaliva
Copy link
Contributor Author

vzaliva commented Oct 29, 2024

The problem is that, in the Cerberus Core output, annotations denoting the corresponding regions in the C code precede the Core language constructs (expressions, definitions, etc.) to which they apply. It is trivial to parse the annotation comments using simple regexps, but detecting the bounds of the Core expression (possibly spanning multiple lines) following the annotation requires deeper parsing of the language.

I am working on another patch that would allow the use of tree-sitter grammars for syntax highlighting and the use of existing tree-sitter grammars for other languages. Thus, the potential benefit of tree-sitter dependency would not be just for one language. It is a nice parsing framework that integrates well with both backend (Node.js) and frontend parsing (via wasm support and monaco-tree-sitter package). Tree-sitter-based parsers are also typically robust and suitable for real-time parsing and syntax highlighting while editing text in real time.

I understand your hesitancy to add another dependency. I am not familiar enough with the Node.js infrastructure, but perhaps it is possible to make this dependency optional, so it will only be required when the project is built with Cerberus compiler support?

@junlarsen
Copy link
Member

From what I can tell, monaco-tree-sitter is very unmaintained and it would add a lot of complexity to attempt to bundle it on the client side (besides, that would be a second wasm build of tree sitter) https://github.com/Menci/monaco-tree-sitter

@vzaliva
Copy link
Contributor Author

vzaliva commented Oct 29, 2024

To clarify: I am not suggesting yet bringing in a monaco-tree-sitter. This is a discussion for another PR :) I just mentioned it as a potential future benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants