Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Full statement syntaxing #156

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msftrncs
Copy link
Contributor

@msftrncs msftrncs commented Feb 3, 2019

This PR demonstrates a nearly complete conversion of the syntax rules for the purpose of full statement syntaxing. Again, this PR is for feedback, and not entirely intended to be accepted.

This would resolve issues such as #152 and #153, and provides a little more theme opportunity, as now function calls can be scoped.

image

In the above, you can see that writeproperty, writeXMLvalue and writevalue are scoped as a function call. (orange in the VS Code Monokai Dimmed theme my theme is based on) The red text is exceptions that I have yet to handle, so there is some work left.

@msftrncs
Copy link
Contributor Author

msftrncs commented Feb 3, 2019

There are a few issue remaining:

  • in keyword of foreach not handled
  • else or elseif not handled if used on same line as the if
  • switch parameters not handled, including 'file' argument
  • numerics (positive only) as an argument to a function can invoke an expression (operators) highlight, when none should be possible.
  • if a line ends with a variable, the next line will scope as a continuation.
  • if a subexpression/expression ($() or ()) ends with a variable, the remainder of the line fails to process, locking in any scopes that might have otherwise ended. Might not be related to above.
    Edit: Turned out to be a combination of above, and special case '{foreach/where/sort/tee}-object' cmdlets. A workaround was found for the \G behavior issue.
  • interpolated subexpression behavior does not match PowerShell, the latter quirkily ends any string or comment within the subexpression if the closing parenthesis (unmatched within the sub-string or comment) comes along first.
  • When a variable starts an unquoted argument, trailing certain characters scope as their normal meaning, instead of becoming part of an unquoted expandable string.
  • commands that start with a single '-' (any dash), the dash is incorrectly treated as a operator.
  • '&' submit job terminator incorrectly treated as the unary invoke operator
  • operators (binary or pre-unary) fail to offer a line continuation to get to the next operand, which only obviously affects enum and hashtable.
  • method calls are scoping command calls directly within their (…), when they should not
  • UPDATED: VS CODE: a type reference used immediately after many of the operators is instead treated as an index operator. This is a \G issue that is not easy to get around in this case. Update: an improvement to prevent accessors from continuing to fire after white space behind a true property has serendipitously resolved this issue.
  • a colon : after a parameter name scopes as unquoted text, when actually it is a separator, this causes an immediately following variable to scope a member as unquoted text instead of being a member.
  • when an argument is started with a consecutive number of variables, successive variables should not allow member access unless all the ones before it did. echo $a$b$c[1], the [1] is unquoted text, not an index accessor. (corrected in ab8e463)
  • a '-' or '+' operator between a variable and a number in an expression, without space before the number, will be treated as a part of the number. (in $a+3 or $a +3 the +3 will be treated as 'constant.numeric' instead of splitting the operator out, the latter example happens in the original grammar as well)
  • on the other hand, the first '-' immediately after an operand is incorrectly scoped as a unary negation (as long as it doesn't immediately precede a numeric value).
  • methods in classes scope differently if a block comment appears between the method name and its parameter definitions.
  • negate (-) and inc/dec (++/--) operators not scoping correctly prefix vs postfix syntaxes, affects line continuation scoping.
  • 'configuration' not currently handled
  • Various blocks (begin / end / process / parallel, etc) will highlight member access following their statement block, when it should not as these are statement blocks, not scriptblocks. (Most statement blocks already have this handled as of 8254786)
  • The first token after the invoke operator & can scope as a parameter instead of being forced to be a function/program name argument.
  • class, enum and the 'functions' can inadvertently repeat their blocks if they appear consecutively without spaces, even after line breaks. This affects how the block(s) that follow are scoped, and prevents accessors from working on those blocks.
  • ATOM: operators '+' or '-' fail to scope as such when immediately preceding numeric values (of which are the right operand), instead scoping as part of the numeric value. This was working correctly at one time.
  • Support needed for unknown attributes. Currently using attributes other than the PowerShell standard attributes ('flags()', 'parameter()', 'cmdletbinding()', 'alias()', etc...) result in being scoped invalid, and the master branch doesn't scope them at all. My thought is that the standard ones need to be scoped as 'support', while the unknowns should be 'storage.type.attribute'. Most theme's would then just color them the same as any other non-support type.
  • PowerShell 7 Feature: Pipe on a new line can be allowed to continue an unterminated command or expression from a previous line. Previously a line was terminated by the EOL if it wasn't escaped, and a pipe at the start of the following line would be considered 'empty' which is invalid.
  • PowerShell 7 Feature: ? : ternary operator
  • hash table element assignments should be statement mode, but any statement block based statements will derail subsequent assignments, if statement being the worst offender.
  • pipe after a [type] is incorrectly marked as an 'empty' pipeline.

There may be more. There are probably ways to get unexpected results, but that listed above is what I see from the normal code, or code examples, I have here.

UPDATE: VS Code's TextMate implementation causes an operator that immediately follows (without separation) a numeric value which happens to be the first token of an expression statement to incorrectly scope as a function invocation. This has to do with the use of \G. Fixed in VS Code 1.35.

@msftrncs
Copy link
Contributor Author

msftrncs commented Feb 3, 2019

A JSON version of the syntax is available at https://github.com/msftrncs/PowerShell.tmLanguage/tree/argumentmode_2ndtry, I produced this PList version from there,

@msftrncs msftrncs changed the title Full statement syntaxing WIP Full statement syntaxing Feb 4, 2019
@msftrncs
Copy link
Contributor Author

I was able to test this grammar on Atom, though only to confirm that VS Code does not behave the same with \G. The current version appears to work really well with Atom, though I have not unrolled every work-around. I do have PR's in with VSCode-TextMate to improve the behavior in VS Code.

@msftrncs
Copy link
Contributor Author

msftrncs commented Mar 8, 2019

I finally have IF/ELSEIF/ELSE working the way they should. This was a huge lesson in handling the grammar. The lesson learned is to 'always march forward'. Retreating the grammar rules needs to occur in only the right conditions to ensure inner sections of rules cannot reapply to similar but unrelated constructs. In this case, the only sure way out of a rule (or an entire stack of rules) is at the end of a line.

There are still quite a few things I think should be worked on to make the grammar right, and with this lesson it will be easier to do that now.

@TylerLeonhardt
Copy link
Member

From the screenshots you've sent I'd say it's looking good!

@msftrncs
Copy link
Contributor Author

msftrncs commented Mar 10, 2019

This shot shows the recent commits work towards the statement style scoping, how the same keyword is scoped differently in different contexts.

image
(Updated image to show labels, do-until loop.)

This is with VS Code. I was disappointed to find out that something is different in Atom and it doesn't fully work. I think Atom may not be re-running the rules against the 'EOL' as VS Code does, and I am specifically making use of this. In VS Code, $ matches either before or after the \n, and VS Code gives the current rule stack a chance to match even after the \n has been consumed, just for that possibility.

@msftrncs
Copy link
Contributor Author

I got a solution for the deviation in Atom worked out in fd37e90, dealing with chained statements (if/elseif/else or try/catch/finally or any statement that uses a statement block).

@msftrncs
Copy link
Contributor Author

msftrncs commented Mar 30, 2019

Showing the result of the changes in the last commit: (Edit, 'master' on left, this PR on right, same theme, after the @Splat regression was fixed and improved)
image

@msftrncs
Copy link
Contributor Author

The most recent commit reaches my long desired goal of scoping all the auto/language variables regardless of the syntax used to get them:

image

@msftrncs
Copy link
Contributor Author

Guess I should have been looking at all the tests in the PowerShell repository. Evaluating them for some work I am doing in PowerShell/PowerShell, and I noticed that I am missing allowing full statements behind hashtable assignments, and so when I tweaked things to allow that, it shows a huge flaw in the if/elseif/else syntaxing logic. That, amongst a few other things.

@msftrncs msftrncs force-pushed the FullStatmentSyntaxing branch from 4d22dad to dd1791b Compare October 16, 2019 02:34
@msftrncs
Copy link
Contributor Author

I have rebased and squashed, as the commit history at this point is mostly clutter, the entire syntax file has been practically completely overhauled by this PR. There is still a lot to do, as I would still need to apply scope adjustments and fix tests to bring this up to par.

@RobertKlohr
Copy link

I tested this today to see if and how it might fix issue #194 I created. In it's current state it fixes all of the problems I was seeing except two.

  1. The parameter for the .PARAMETER keyword does not highlight. I don't fully understand TextMate so I may not use the correct terms here but it seems the capture name "entity.name.parameter.documentation.powershell" is either incorrect or is not implemented in the default theme,
  2. The parameter for the .LINK keyword (on the line below) does not highlight. I can see that there is just no code to do highlighting for the parameter for this keyword.

@msftrncs
Copy link
Contributor Author

@rjkgithub,

For 1, its a theme thing. entity.name.parameter is not very common, as in the end, most objects becomes variable. Defining this scope however in this PR, is what makes all of PowerShell's parameter uses stand out.

image

image

For 2, I am not sure there is any reason to scope it, nor a clear scope to specify. Many editors already do something about links.

image

@RobertKlohr
Copy link

@msftrncs
I agree with your assessment of the LINK keyword. It is a weird edge case and Get-Help will work with anything you put there so I don't see any reason to add it to the scope.

For the issue about the theme not supporting entity.name.parameter should that be opened as an issue with VSCode or the theme directly if it has a separate repository? Asking as I am using the default theme that ships with VSCode which I would expect to support all all of the possible syntax highlighting tokens.

Also do you have a suggestion for a dark theme that does support all (or at least more) of the syntax token highlighting?

@msftrncs
Copy link
Contributor Author

Looking at things closer, its looking like parameter is preferred to be a variable sub-scope instead of under entity. The sublime scope recommendations lists variable.parameter. However, a lot of themes just color variable so it all ends up one color. Monokai and Tomorrow Night both seem to have coloring for variable.parameter though. Myself I just customized Monokai Dimmed.

@RobertKlohr
Copy link

When I manually edited the sub-scope to variable for the .PARAMETER keyword capture all of the built-in themes (VSCode) I tested colored the keyword and parameter as two different colors. It seems like changing from entity to variable is the best approach.

@MartinGC94
Copy link

This is amazing, the accuracy is almost as good as the real syntax highlighting the ISE provided (at least with "normal" powershell code that doesn't try to be hard to read).

The only issue I have found is that functions that don't use the standard verb-noun format share the same scope as the function name in function definitions. I've "fixed" this by replacing the regex on this line: https://github.com/msftrncs/PowerShell.tmLanguage/blob/ed163f4c9a7247acb7ab3deb24865dfc9061b326/powershell.tmLanguage.json#L2433 with this: [?|%|\\w][-|\\w|\\.]* and it doesn't seem to have a negative impact on any other scopes but I don't know enough about textmate grammar to know for sure.

The only issue with this "fix" is that doesn't work work if the command is prefixed by a path like .\ping.exe and if I try to change it to also account for paths it affects other scopes as well.

@msftrncs
Copy link
Contributor Author

msftrncs commented Mar 3, 2021

@MartinGC94, thanks for the feedback, and sorry for the very long delay.

Myself, I prefer if all commands or functions scope the same, unless they are truly language standard (built-in) functions. Unfortunately in PowerShell, nothing is really standard, as everything can be replaced, so I left the original 'approved verb'-noun concept scoping as support.function and have everything else go in to entity.name.function. PowerShell ISE and PSReadLine both just classify everything as function (because they use PowerShell internals that actually is labeled Command), except method calls, because those are part of a variable. Of course the main goal was to get significantly better coverage of the syntax than was previously being handled.

I've been trying to work on really technical aspects, such as operator scoping, because that is where this still differs the most from ISE or PSReadLine, even though its not obvious. I just realized this is occurring:

image

The ++ is scoping as a 'pre-unary' operator which expects an operand afterwards, and it eats the if incorrectly, thinking its an invalid operand.

@andyleejordan
Copy link
Member

Hey, sorry, I'm definitely catching up here. Is this PR intended / ready to be merged? What are the risks with it? Clearly it seems to have a lot of benefits. FYI @rjmholt @SydneyhSmith

@MartinGC94
Copy link

Good to see you are back to working on this, I was a little afraid the semantic highlighting had made you drop this but like I said before the earlier implementation with my change was pretty much perfect to me.

I'm not really sure I follow the logic in having different scopes for commands that follow the verb-noun pattern vs commands that don't. The point of syntax highlighting is to help me see how my text is interpreted by the language, right? So if PS thinks "ls" and "Get-ChildItem" is the same then why shouldn't my editor show me that? Even more confusingly if you use an unapproved verb it also gets a different color/scope "Get-ChildItem" VS "Validate-Item".
I think the most logical scoping is to have all commands share the same scope regardless of their naming scheme and treat the function name part of a function declaration as an argument for the "function" keyword similar to the way ISE does it.

@JustinGrote
Copy link

@andschwa the goal of this PR was to fix up the Textmate grammar. I think the semantic highlighting has probably eschewed a lot of the usefulness of this (it was created before semantic highlighting was an option) but it's probably still worth merging. The main risk would be for users who are "used to" old stylings and the problem quirks and have not turned on semantic highlighting, or have done some very specific custom themes, will be affected and have their colors changed potentially, but they can always re-adjust them back "how they like them" only now with proper symbols.

@andyleejordan
Copy link
Member

@JustinGrote This Textmate grammar is used beyond the VS Code PowerShell extension though (and if I'm understanding it right, we do use it but as a fallback, with semantic highlighting on top), most prominently by GitHub's highlighting of PowerShell files, so it seems to me we should maintain and improve it regardless of the introduction of semantic highlighting to the extension.

@msftrncs Regarding:

I'm not really sure I follow the logic in having different scopes for commands that follow the verb-noun pattern vs commands that don't. The point of syntax highlighting is to help me see how my text is interpreted by the language, right? So if PS thinks "ls" and "Get-ChildItem" is the same then why shouldn't my editor show me that? Even more confusingly if you use an unapproved verb it also gets a different color/scope "Get-ChildItem" VS "Validate-Item".
I think the most logical scoping is to have all commands share the same scope regardless of their naming scheme and treat the function name part of a function declaration as an argument for the "function" keyword similar to the way ISE does it.

While I've not been involved in the design or discussion to this point, I think I agree with your point here that syntax highlighting is intended to show the reader how the text is interpreted by the language, so I wouldn't want approved/unapproved verbs (and therefore aliases) being rendered with different colors. (At least not be default!) Is that something blocking this PR?

@JustinGrote
Copy link

@andschwa I agree on all counts.

@msftrncs
Copy link
Contributor Author

Honestly the approved verb command scoping is carried over from the current release.

There are two different scopes at play here for commands:

  • support.function
  • entity.name.function

While entity.name.function isn't very specific, it is what all other language's grammars use for generic functions, both for definition and for invocation. On the other hand, support.function is used to indicate well known, usually built in or language specification driven functions, that are in many cases are considered 'keywords' even if not reserved, so they often get themed as keywords. The usefulness here of using support.function only for approved verbs, is the immediate recognition that you have used an approved verb. and that is probably about it. In other words, it meets the idea for 'well known' or 'language specification'.

I will admit that there needs to be some scoping of something along the lines of meta.command.name specifically for command invocations. Function definitions should also receive a meta scope (meta.function.definition is what TS/JS uses). Some themes specifically differentiate the definition portion from an invocation via the meta scopes, often with an attribute such as underlining or italicizing.

Comparisons to ISE are slightly flawed. ISE is limited entirely to the tokenizing of PowerShell script by the PowerShell engine (the same as PSReadLine), and is not really representative of higher granularity offered by textmate language grammars. Further I believe that the tokenizing of function names in the function definition as an argument is one of those items that PowerShell's team would have liked to have done different, as witnessed in the difference of how enum and class's definitions are handled, being much newer concepts. Unfortunately changing the definitions of the tokenizer would be a breaking change. For instance consider that PowerShell considers a method invocation to just be a 'member' access, and thus ISE doesn't differentiate it any differently than a property, where as most other language grammars treat them separately, with a method invocation being marked entity.name.function, instead of variable.other.object.property.

There is definitely a need to consider the effects of this PR on the ISE theme, as I have given it no concern so far.

@MartinGC94
Copy link

MartinGC94 commented Aug 5, 2024

@msftrncs Do you think you'll get back to this in the near future? If not, I might take a stab at learning textmate grammar and try to write my own syntax.

-Edit: I took a look at it and figured it will take more time and effort than I'm willing to put into this. If nobody from the community steps up then I hope Microsoft as a company will because the current grammar is awful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants