-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve lexing of ternaries that include symbols in Ruby lexer #1476
Improve lexing of ternaries that include symbols in Ruby lexer #1476
Conversation
I haven't taken a super close look at this, but definitely check the year on that stackoverflow post - Ruby's syntax has changed a great deal, and you don't need backslashes for code like this:
|
...also |
Hang on I take that back. a?b :c # parsed as a? (b) (:c), syntax error for missing comma
a?b:c # parsed as a?(b: c), b is a symbol
a ?b:c # parsed as (a) ? (b) : (c), a proper ternary operator
a ?b :c # same as above |
@jneen Perhaps I'm now optimising for the tests but the samples are now all lexed correctly. This isn't counting the lexing of methods ending in |
…-ruby#1476) Ruby's rules for how it parses ternaries are complicated. This is all the more the case if the ternary contains symbols. The current lexer uses a simple test to determine whether a colon demarcates the branches of the ternary: is the colon immediately followed by another colon? While this rule suffices for many cases, it causes ternaries including symbols to be lexed incorrectly. This commit replaces the simple rule with a rule for each of the following cases: - **Simple case**: The simple case is where there is whitespace following the colon being matched. - **Complex case**: The complex case is where there are no additional colons on that line (excluding colons in trailing comments) that follow the colon being matched. If either of the above cases apply, the colon is tokenised as `Punctuation` and the lexer moves to the `:expr_start` state. These rules have been tested with a number of complex ternaries involving colons and are lexed in a manner consistent with Ruby's parser. These test cases have been added to the visual sample.
Ruby's rules for how it parses ternaries are complicated. This is all the more the case if the ternary contains symbols. The current lexer uses a simple test to determine whether a colon demarcates the branches of the ternary: is the colon immediately followed by another colon. While this rule suffices for many cases, it can have problems with symbols.
This PR replaces the simple rule with two rules:
Simple case: The simple case is where there is whitespace surrounding the colon. If this is the case, the colon demarcates the branches of the ternary.
Complex case: The complex case tests whether there are no more colons or backslashes on the line. If there aren't any, the colon should be treated as demarcating the branches (see this StackOverflow question for edge case examples).
This fixes #1038.