Improve NASM lexer #1428

pyrmont · 2020-02-05T18:26:59Z

As discussed in a number of issues (notably #483 and #1310), there are multiple problems with the NASM lexer. Part of me wonders if the better solution isn't to just throw out these rules and rewrite it based on Chroma's NASM lexer.

Thoughts @travisdowns and @bernardosulzbach?

lib/rouge/lexers/nasm.rb

bernardosulzbach · 2020-02-05T18:45:27Z

No thoughts, do you know if Chroma's NASM lexer is any good?

bernardosulzbach · 2020-02-05T18:49:08Z

In time, don't you think we should have a helper function that gets a set of lowercase strings and returns a set with all those strings and all of them in uppercase as well? This would remove a lot of repeated code, which would prevent us from missing one version of the instruction codes for some instructions and would make these Ruby files smaller and more readable.

travisdowns · 2020-02-05T18:57:58Z

@pyrmont - I don't have any direct experience Chroma's lexer. It does seem to take a simple strategy at least with respect to instructions: it doesn't have an exhaustive list of instructions but just treats anything that doesn't match the other categories as one thing (I think the NameVariable case), which I think is fine and means you don't have to be constantly on top of new ISA extensions. It would be less desirable, e.g., for syntax highlighting during live editing where the visual feedback that you typed a real instruction is useful (not sure if rouge is used for that?).

It does seem like this is a problem that gets solved over and over. E.g., I use nasm highlighting in vscode from this which seems to derive its highlighting from a nasm parser from a Sublime text thing - the corresponding lexer grammar seems to be here.

bernardosulzbach · 2020-02-07T07:43:58Z

it doesn't have an exhaustive list of instructions but just treats anything that doesn't match the other categories as one thing (I think the NameVariable case), which I think is fine and means you don't have to be constantly on top of new ISA extensions. It would be less desirable, e.g., for syntax highlighting during live editing where the visual feedback that you typed a real instruction is useful (not sure if rouge is used for that?).

I think this may backfire even if rouge is not used for live editing as "invalid" instructions would be highlighted as if they were valid instructions, making typos in example code more difficult to spot.

travisdowns · 2020-02-08T02:37:55Z

I think this may backfire even if rouge is not used for live editing as "invalid" instructions would be highlighted as if they were valid instructions, making typos in example code more difficult to spot.

Right, but I guess my claim was that code is likely to be copied from a live editing scenario, so be more or less free from typos.

That said, I agree with you: there is a downside to not including an instruction list. If I could choose, I'd definitely choose to have instructions highlighted, if it had no cost.

Here though, I was trying to strike a middle ground between unusable and perfect. IMO the current syntax highlighting is mostly unusable because parsing of common constructs (like index addressing) fails, leading to ugly red highlights in valid code. On my own blog, I just override the styles to basically disable the error style because it causes too much problems.

So I am happy for a middle ground solution which doesn't try to validate instruction mnemonics but just parses all unknown text in a consistent way, as if it could be instruction.

If there is interesting in doing the gold-plated solution, recognizing instructions: I am also happy to help, by e.g., providing a list. Keep in mind there are at least 1,000 instructions and they are growing all the time: Intel has some 20 ISA extensions in AVX-512 alone, comprising 100s of instructions and that train doesn't seem to be slowing down.

bernardosulzbach · 2020-02-08T23:37:02Z

That said, I agree with you: there is a downside to not including an instruction list. If I could choose, I'd definitely choose to have instructions highlighted, if it had no cost.

I agree about the extra complexity in maintaining an updated list of instructions, but I think that a highlighter, even more so one that is run statically (as rouge is probably most often used) should probably not cut corners for performance.

Here though, I was trying to strike a middle ground between unusable and perfect. IMO the current syntax highlighting is mostly unusable because parsing of common constructs (like index addressing) fails, leading to ugly red highlights in valid code.

Yes, but this is a different issue altogether. The failure to parse index expressions is certainly a bug that has to be addressed.

travisdowns · 2020-02-09T02:05:12Z

I agree about the extra complexity in maintaining an updated list of instructions, but I think that a highlighter, even more so one that is run statically (as rouge is probably most often used) should probably not cut corners for performance.

I wasn't suggesting the tradeoff was between functionality (instruction highlighting) and performance. I wasn't considering performance at all: I would guess it could be made fast enough.

I was suggesting that the tradeoff is instead against correctness and maintainability. A syntax highlighter that exhaustively lists instructions in order to highlight them becomes incorrect as soon as new mnemonics are introduced.

Personally, I would rather a highlighter that correctly highlights all instructions. However, in my experience (with various asm parsers) the choice is often between highlighting only some instructions and not highlighting other newer ones. This means when I write correct code, it is highlighted wrong. In that case, I'd prefer the simpler highlighter which doesn't know instruction names and is consistent.

My experience so far is that the size of the inspection between the sets of people who (1) know nasm syntax well enough to write a lexer and (2) know enough about Ruby and rouge and (3) have the time, energy and desire to work on this for free is small. Possibly zero. E.g. I probably have (1) and (3) but not (2).

So I was trying to err on the side of a smaller burden for making this change and especially a smaller ongoing maintenance burden, because I think more than that may be unrealistic.

travisdowns · 2020-02-09T02:09:22Z

Yes, but this is a different issue altogether. The failure to parse index expressions is certainly a bug that has to be addressed.

Agreed, but I think this issue was partly driven by that (both the issues linked at the top mention it), so it can inform the current situation. In particular it seems like the bracket parsing thing was actually reported first three years ago. That it has persisted for so long is a strong indication to me that we should err on the side of "dumb, simple, will work without updates" for any change. IMO ofc.

bernardosulzbach · 2020-02-10T10:36:06Z

(...) So I was trying to err on the side of a smaller burden for making this change and especially a smaller ongoing maintenance burden, because I think more than that may be unrealistic. (...) That it has persisted for so long is a strong indication to me that we should err on the side of "dumb, simple, will work without updates" for any change.

I see your point now, and I think you are right. We should strive to get something that just works.

pyrmont · 2020-02-10T10:37:00Z

Sorry for the delay on this one, guys. Am aiming to have a look at it this evening! Thank you for the spirited discussion :)

pyrmont · 2020-02-10T20:06:56Z

I took the plunge and rewrote the lexer on the basis of Chroma's implementation. The visual sample looks like it's highlighting correctly to me. Thoughts?

bernardosulzbach · 2020-02-12T17:22:08Z

This seems to have fixed the issue. Thank you.

pyrmont · 2020-02-16T22:31:45Z

@bernardosulzbach Thanks for the review and sorry this took so long. Hopefully this improves the situation for users :)

This will be included in the next version of Rouge, 3.17.0. That one is scheduled to go out on 17 March 🎉

bernardosulzbach · 2020-02-16T22:37:56Z

@bernardosulzbach Thanks for the review and sorry this took so long. Hopefully this improves the situation for users :)

This will be included in the next version of Rouge, 3.17.0. That one is scheduled to go out on 17 March

Thank you for porting it! I think this will work much better and will not need any workarounds such as what @travisdowns mentioned because of highlighting errors.

travisdowns · 2020-03-28T19:51:17Z

@pyrmont thanks so much! Sorry I've been silent for a while, I've been busy but finally have time to check this out.

I think I can use the dingus to check this out, since the dingus has v3.17, right?

Is there anything that gets parsed as an error? Some stuff shows in red in the dingus, but not sure if that's error. See screenshot:

(presumably those shouldn't be errors)

pyrmont · 2020-03-31T07:26:45Z

@travisdowns No worries. The text is red there because of the theme used on the Dingus. An error would show as a red background (example).

pyrmont added 3 commits February 6, 2020 03:15

Add examples of addressing expressions

93928eb

Update wrapping of keywords

916ea99

Treat brackets as punctuation

4c29dd9

pyrmont self-assigned this Feb 5, 2020

pyrmont added the needs-review The PR needs to be reviewed label Feb 5, 2020

pyrmont mentioned this pull request Feb 5, 2020

NASM addressing lexing fails on most types of addresses #1310

Closed

bernardosulzbach reviewed Feb 5, 2020

View reviewed changes

lib/rouge/lexers/nasm.rb Outdated Show resolved Hide resolved

Rewrite NASM lexer using Chroma lexer as reference

9dd9c8f

pyrmont marked this pull request as ready for review February 10, 2020 20:07

pyrmont merged commit 36f1b72 into rouge-ruby:master Feb 16, 2020

pyrmont deleted the bugfix.nasm-lexer-fixes branch February 16, 2020 22:30

pyrmont removed the needs-review The PR needs to be reviewed label Feb 16, 2020

harm-smits mentioned this pull request Mar 26, 2020

[IDE] - Highlighting is broken ajkhoury/JetBrains-NASM-Language#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve NASM lexer #1428

Improve NASM lexer #1428

pyrmont commented Feb 5, 2020

bernardosulzbach commented Feb 5, 2020

bernardosulzbach commented Feb 5, 2020

travisdowns commented Feb 5, 2020

bernardosulzbach commented Feb 7, 2020

travisdowns commented Feb 8, 2020 •

edited

Loading

bernardosulzbach commented Feb 8, 2020

travisdowns commented Feb 9, 2020

travisdowns commented Feb 9, 2020

bernardosulzbach commented Feb 10, 2020

pyrmont commented Feb 10, 2020

pyrmont commented Feb 10, 2020

bernardosulzbach commented Feb 12, 2020

pyrmont commented Feb 16, 2020

bernardosulzbach commented Feb 16, 2020

travisdowns commented Mar 28, 2020 •

edited

Loading

pyrmont commented Mar 31, 2020

Improve NASM lexer #1428

Improve NASM lexer #1428

Conversation

pyrmont commented Feb 5, 2020

bernardosulzbach commented Feb 5, 2020

bernardosulzbach commented Feb 5, 2020

travisdowns commented Feb 5, 2020

bernardosulzbach commented Feb 7, 2020

travisdowns commented Feb 8, 2020 • edited Loading

bernardosulzbach commented Feb 8, 2020

travisdowns commented Feb 9, 2020

travisdowns commented Feb 9, 2020

bernardosulzbach commented Feb 10, 2020

pyrmont commented Feb 10, 2020

pyrmont commented Feb 10, 2020

bernardosulzbach commented Feb 12, 2020

pyrmont commented Feb 16, 2020

bernardosulzbach commented Feb 16, 2020

travisdowns commented Mar 28, 2020 • edited Loading

pyrmont commented Mar 31, 2020

travisdowns commented Feb 8, 2020 •

edited

Loading

travisdowns commented Mar 28, 2020 •

edited

Loading