Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lexing of ternaries that include symbols in Ruby lexer #1476

Merged
merged 7 commits into from
Apr 14, 2020

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented Apr 3, 2020

Ruby's rules for how it parses ternaries are complicated. This is all the more the case if the ternary contains symbols. The current lexer uses a simple test to determine whether a colon demarcates the branches of the ternary: is the colon immediately followed by another colon. While this rule suffices for many cases, it can have problems with symbols.

This PR replaces the simple rule with two rules:

  1. Simple case: The simple case is where there is whitespace surrounding the colon. If this is the case, the colon demarcates the branches of the ternary.

  2. Complex case: The complex case tests whether there are no more colons or backslashes on the line. If there aren't any, the colon should be treated as demarcating the branches (see this StackOverflow question for edge case examples).

This fixes #1038.

@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Apr 3, 2020
@pyrmont pyrmont self-assigned this Apr 3, 2020
@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 3, 2020

@jneen It doesn't look to me like this causes any regressions in the visual sample (although I take your point in #1038 that this is an exceptionally tricky area to lex correctly). Do you have any thoughts?

@jneen
Copy link
Member

jneen commented Apr 4, 2020

I haven't taken a super close look at this, but definitely check the year on that stackoverflow post - Ruby's syntax has changed a great deal, and you don't need backslashes for code like this:

condition
  ? antecedent
  : alternative

@jneen
Copy link
Member

jneen commented Apr 4, 2020

...also a?b:c is completely valid. ><

@jneen
Copy link
Member

jneen commented Apr 4, 2020

Hang on I take that back.

a?b :c # parsed as a? (b) (:c), syntax error for missing comma
a?b:c # parsed as a?(b: c), b is a symbol
a ?b:c # parsed as (a) ? (b) : (c), a proper ternary operator
a ?b :c # same as above

@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 5, 2020

@jneen Perhaps I'm now optimising for the tests but the samples are now all lexed correctly.

ternaries

This isn't counting the lexing of methods ending in ?. Those aren't matched by the ternary rules but are subtly broken, I think. That seems like something that should be solved in a different PR.

@pyrmont pyrmont merged commit 7bf2159 into rouge-ruby:master Apr 14, 2020
@pyrmont pyrmont deleted the bugfix.ruby-ternaries-symbols branch April 14, 2020 07:30
@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label Apr 14, 2020
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 21, 2020
…-ruby#1476)

Ruby's rules for how it parses ternaries are complicated. This is all
the more the case if the ternary contains symbols. The current lexer
uses a simple test to determine whether a colon demarcates the branches
of the ternary: is the colon immediately followed by another colon? 

While this rule suffices for many cases, it causes ternaries including
symbols to be lexed incorrectly. This commit replaces the simple rule
with a rule for each of the following cases:

- **Simple case**: The simple case is where there is whitespace
  following the colon being matched.

- **Complex case**: The complex case is where there are no additional 
  colons on that line (excluding colons in trailing comments) that 
  follow the colon being matched.

If either of the above cases apply, the colon is tokenised as
`Punctuation` and the lexer moves to the `:expr_start` state.

These rules have been tested with a number of complex ternaries
involving colons and are lexed in a manner consistent with Ruby's
parser. These test cases have been added to the visual sample.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ruby Ternary doesn't identify symbols correctly
2 participants