-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NASM lexer #1428
Improve NASM lexer #1428
Conversation
No thoughts, do you know if Chroma's NASM lexer is any good? |
In time, don't you think we should have a helper function that gets a set of lowercase strings and returns a set with all those strings and all of them in uppercase as well? This would remove a lot of repeated code, which would prevent us from missing one version of the instruction codes for some instructions and would make these Ruby files smaller and more readable. |
@pyrmont - I don't have any direct experience Chroma's lexer. It does seem to take a simple strategy at least with respect to instructions: it doesn't have an exhaustive list of instructions but just treats anything that doesn't match the other categories as one thing (I think the It does seem like this is a problem that gets solved over and over. E.g., I use nasm highlighting in vscode from this which seems to derive its highlighting from a nasm parser from a Sublime text thing - the corresponding lexer grammar seems to be here. |
I think this may backfire even if rouge is not used for live editing as "invalid" instructions would be highlighted as if they were valid instructions, making typos in example code more difficult to spot. |
Right, but I guess my claim was that code is likely to be copied from a live editing scenario, so be more or less free from typos. That said, I agree with you: there is a downside to not including an instruction list. If I could choose, I'd definitely choose to have instructions highlighted, if it had no cost. Here though, I was trying to strike a middle ground between unusable and perfect. IMO the current syntax highlighting is mostly unusable because parsing of common constructs (like index addressing) fails, leading to ugly red highlights in valid code. On my own blog, I just override the styles to basically disable the error style because it causes too much problems. So I am happy for a middle ground solution which doesn't try to validate instruction mnemonics but just parses all unknown text in a consistent way, as if it could be instruction. If there is interesting in doing the gold-plated solution, recognizing instructions: I am also happy to help, by e.g., providing a list. Keep in mind there are at least 1,000 instructions and they are growing all the time: Intel has some 20 ISA extensions in AVX-512 alone, comprising 100s of instructions and that train doesn't seem to be slowing down. |
I agree about the extra complexity in maintaining an updated list of instructions, but I think that a highlighter, even more so one that is run statically (as rouge is probably most often used) should probably not cut corners for performance.
Yes, but this is a different issue altogether. The failure to parse index expressions is certainly a bug that has to be addressed. |
I wasn't suggesting the tradeoff was between functionality (instruction highlighting) and performance. I wasn't considering performance at all: I would guess it could be made fast enough. I was suggesting that the tradeoff is instead against correctness and maintainability. A syntax highlighter that exhaustively lists instructions in order to highlight them becomes incorrect as soon as new mnemonics are introduced. Personally, I would rather a highlighter that correctly highlights all instructions. However, in my experience (with various asm parsers) the choice is often between highlighting only some instructions and not highlighting other newer ones. This means when I write correct code, it is highlighted wrong. In that case, I'd prefer the simpler highlighter which doesn't know instruction names and is consistent. My experience so far is that the size of the inspection between the sets of people who (1) know nasm syntax well enough to write a lexer and (2) know enough about Ruby and rouge and (3) have the time, energy and desire to work on this for free is small. Possibly zero. E.g. I probably have (1) and (3) but not (2). So I was trying to err on the side of a smaller burden for making this change and especially a smaller ongoing maintenance burden, because I think more than that may be unrealistic. |
Agreed, but I think this issue was partly driven by that (both the issues linked at the top mention it), so it can inform the current situation. In particular it seems like the bracket parsing thing was actually reported first three years ago. That it has persisted for so long is a strong indication to me that we should err on the side of "dumb, simple, will work without updates" for any change. IMO ofc. |
I see your point now, and I think you are right. We should strive to get something that just works. |
Sorry for the delay on this one, guys. Am aiming to have a look at it this evening! Thank you for the spirited discussion :) |
I took the plunge and rewrote the lexer on the basis of Chroma's implementation. The visual sample looks like it's highlighting correctly to me. Thoughts? |
This seems to have fixed the issue. Thank you. |
@bernardosulzbach Thanks for the review and sorry this took so long. Hopefully this improves the situation for users :) This will be included in the next version of Rouge, 3.17.0. That one is scheduled to go out on 17 March 🎉 |
Thank you for porting it! I think this will work much better and will not need any workarounds such as what @travisdowns mentioned because of highlighting errors. |
@pyrmont thanks so much! Sorry I've been silent for a while, I've been busy but finally have time to check this out. I think I can use the dingus to check this out, since the dingus has v3.17, right? Is there anything that gets parsed as an error? Some stuff shows in red in the dingus, but not sure if that's error. See screenshot: (presumably those shouldn't be errors) |
@travisdowns No worries. The text is red there because of the theme used on the Dingus. An error would show as a red background (example). |
As discussed in a number of issues (notably #483 and #1310), there are multiple problems with the NASM lexer. Part of me wonders if the better solution isn't to just throw out these rules and rewrite it based on Chroma's NASM lexer.
Thoughts @travisdowns and @bernardosulzbach?