JSON rewrite #1029

miparnisari · 2018-11-03T21:02:41Z

miparnisari · 2018-11-03T21:03:17Z

Fixes #819

ffes · 2019-01-24T10:25:53Z

I hope one of the members of this project finds the time to review this

pyrmont · 2019-05-29T00:13:53Z

@miparnisari I'm sorry this has been outstanding for so long :( I'm working my way through the backlog and hope to get to this soon!

lib/rouge/lexers/json.rb

jneen · 2019-05-29T23:55:21Z

For the json lexer in particular, performance on large inputs is important. Would you mind testing with larger data? I believe if you search the issues for "json" you will find a few issues that prompted us to make the lexer extremely simple.

Also @pyrmont for avoiding future performance issues I wonder if it might be useful to have some kind of max_length option for the lex - where if the input is longer than the max length, it will give up and highlight the rest as Text. For use cases like Gitlab where they can't control the code this sort of thing could be very helpful.

pyrmont · 2019-05-30T02:38:29Z

I had a look through the issues and while I think I found the ones that @jneen is referring to (see #41, #114, #244), they don't actually include an example of the JSON being tested.

I made one using Mockaroo and have put it in a gist. It's not as large as the 200KB one referenced in #41 but you could always just double it if a larger file is desired.

I'm personally curious what the speed difference is with newer versions of Ruby and will try to give it a spin a bit later.

pyrmont · 2019-05-30T02:40:08Z

@jneen Yeah, a max_length option sounds like an idea worth serious consideration. Will have a bit of a think about this, too.

jneen · 2019-05-30T03:09:55Z

Now that I look closely, I think #525 is relevant (the actual issue there was with the JSON lexer)

jneen · 2019-05-30T03:12:37Z

the offending case was a string that contained a large number of backslashes: https://github.com/jneen/rouge/files/410135/bad_note.txt

miparnisari · 2019-05-30T03:52:47Z

I'm personally curious what the speed difference is with newer versions of Ruby and will try to give it a spin a bit later.

Can you please edit the README to include instructions on how to measure performance of two lexers? Or even in the PR template.

pyrmont · 2019-05-30T04:18:36Z

@miparnisari wrote:

Can you please edit the README to include instructions on how to measure performance of two lexers? Or even in the PR template.

I don't think generally we'd expect submissions to include speedtests; this felt like a special case because of: (1) the prevalence of JSON; and (2) the fact JSON files might realistically be quite large.

That said, providing some 'official' support for benchmarking doesn't sound like a bad idea. I was planning on using the Shootout library mentioned in #114 and will try to see how simple it can be made to test different versions of Rouge (it's really designed to benchmark Rouge against different syntax highlighting libraries which isn't quite what I'm concerned with). To compare versions of Rouge, I was anticipating switching between git branches and then running the tests.

Oh, and as a bonus, I realised the 200KB JSON file is in there.

miparnisari · 2019-05-30T04:49:58Z

a max_length option sounds like an idea worth serious consideration. Will have a bit of a think about this, too.

Agree. I know my company has this logic in place - if the file is too big, it bails out and renders it as plain text.

I don't think generally we'd expect submissions to include speedtests;

Agree. It could be an optional input of the PR.

For the json lexer in particular, performance on large inputs is important. Would you mind testing with larger data?

I tried the 200 kb file. When I use rackup, both this and the existing lexer take around 1.5 sec. (I hope this is correct, I looked at the last number in the request log)

pyrmont · 2019-05-30T19:04:52Z

I got Shootout working and ran it against both this PR and the current master.

                      master               PR #1029
JSON (217 kB)
=> terminal           1209 kB/s            1039 kB/s
=> html               1454 kB/s            1203 kB/s

There's definitely a slow down: terminal is 14% slower and HTML is 17% slower. That's not great but I'm inclined to think this is acceptable given the fact this has been requested multiple times and does seem like an overall usability win.

What do you think @jneen?

jneen · 2019-05-31T09:18:17Z

The code for rouge.jneen.net (in this same org) contains a class that can be used to run different versions of rouge in the same runtime, which might be helpful.

pyrmont · 2019-07-09T04:47:38Z

@miparnisari Thanks for keeping this up to date. I plan to have a look tonight at the case @jneen linked to earlier. I never did actually check that.

pyrmont · 2019-07-10T07:50:51Z

@miparnisari Sorry, I didn't mean to submit two commits in quick succession but as I was writing a message to explain why the code didn't support highlighting what I'm calling 'bare' key-value pairs for performance reasons, I realised there was a way to do it that didn't involve excess backtracking.

In summary, these commits makes several changes to the proposed lexer:

First, they change the tokens used in JavaScript objects. It was proposed that keys be tokenised as Str::Double and that string values be tokenised as Name::Tag. The token Name::Label seems a more appropriate token for keys while Str::Double should be used for string values.
Second, they remove the tokenisation of escape sequences inside object keys. While this tokenisation was correct insofar as a JSON parser would parse the string, it produces an odd result where Name::Label token can be interrupted by Str::Escape tokens.
Third, they add support for 'bare' key-value pairs, that is text of the form <string>: <value>. This preserves backwards compatibility with how the JSON lexer previously worked. These 'bare' key-value pairs are tokenised roughly the same way as they would be within an object. The difference is that escaped characters are not tokenised differently to other characters in a string. This seems an acceptable trade-off to me given that bare strings are not correct JSON in the first place and so how they should be tokenised outside of an object is not really defined.

Hopefully that all makes sense. I'll try to add some more examples tonight to the visual sample to demonstrate how this works.

lib/rouge/lexers/json.rb

pyrmont · 2019-07-11T19:48:21Z

@miparnisari My changes aren't working for subsequent bare key-value pairs. Need to have more of a think about this one.

This commit makes several changes to the proposed lexer. First, it changes the tokens used in JavaScript objects. It was proposed that keys be tokenised as `Str::Double` and that string values be tokenised as `Name::Tag`. The token `Name::Label` seems a more appropriate token for keys while `Str::Double` should be used for string values. Second, it remoeves the tokenisation of escape sequences inside object keys. While this is correct insofar as the keys are merely strings, it fits awkwardly with the tokenisation of keys with a non-string token. This commit tokenises all characters within the key as `Name::Label`. Third, it adds support for 'bare' key-value pairs, that is text of the form <string>: <value>. These 'bare' key-value pairs are not tokenised the same way as they would be within objects. Rather, keys are tokenised as ordinary strings. This preserves backwards compatability with how the JSON lexer previously worked.

This commit provides support for tokenising bare keys as `Name::Label` without introducing backtracking for non-key strings.

pyrmont · 2019-07-12T09:31:34Z

@miparnisari Sorry for overwriting your repository. It was the only way I could see to get the branch caught up to the latest version. I needed to do that because the HOCON lexer (which did not exist at the time you created this branch) subclasses the JSON lexer. Since we're pretty radically changing how it works, that broke that lexer.

It's all fixed now and looks to be making all those keys look nice and pretty :)

spec/lexers/json_spec.rb

lib/rouge/lexers/hocon.rb

miparnisari mentioned this pull request Jan 24, 2019

Use different color for the keys in JSON #819

Closed

pyrmont mentioned this pull request May 14, 2019

Import: JSON rewrite pyrmont/rouge#62

Closed

pyrmont added the needs-review The PR needs to be reviewed label May 29, 2019

pyrmont suggested changes May 29, 2019

View reviewed changes

pyrmont added author-action The PR has been reviewed but action by the author is needed and removed needs-review The PR needs to be reviewed labels May 29, 2019

miparnisari force-pushed the json-rewrite branch from e056675 to 3d42771 Compare May 30, 2019 04:35

pyrmont added needs-review The PR needs to be reviewed and removed author-action The PR has been reviewed but action by the author is needed labels May 30, 2019

pyrmont added maintainer-action The PR has been reviewed but action by a maintainer is required and removed needs-review The PR needs to be reviewed labels May 30, 2019

pyrmont approved these changes Jul 9, 2019

View reviewed changes

pyrmont added needs-review The PR needs to be reviewed and removed maintainer-action The PR has been reviewed but action by a maintainer is required labels Jul 9, 2019

ashmaroli reviewed Jul 10, 2019

View reviewed changes

lib/rouge/lexers/json.rb Outdated Show resolved Hide resolved

miparnisari and others added 7 commits July 12, 2019 16:50

JSON rewrite

d9cf996

JSON: add test, minor fixes

ecc25e4

Fix rubocop warnings "Ambiguous regexp literal."

22eb192

Improve bare key-value pair lexing

abe2d92

This commit provides support for tokenising bare keys as `Name::Label` without introducing backtracking for non-key strings.

Simplify rules

5a6295b

Update HOCON lexer to work with new JSON lexer

6b3de9a

pyrmont force-pushed the json-rewrite branch from 3ed514e to 6b3de9a Compare July 12, 2019 09:20

ashmaroli reviewed Jul 12, 2019

View reviewed changes

spec/lexers/json_spec.rb Outdated Show resolved Hide resolved

ashmaroli reviewed Jul 12, 2019

View reviewed changes

lib/rouge/lexers/hocon.rb Outdated Show resolved Hide resolved

lib/rouge/lexers/hocon.rb Show resolved Hide resolved

Fix typo in HOCON lexer

578cb56

pyrmont added author-action The PR has been reviewed but action by the author is needed and removed needs-review The PR needs to be reviewed labels Jul 18, 2019

Move pathological case to visual sample

94a024d

pyrmont merged commit 028f392 into rouge-ruby:master Jul 19, 2019

pyrmont removed the author-action The PR has been reviewed but action by the author is needed label Jul 19, 2019

MaximeKjaer mentioned this pull request Nov 17, 2019

Fix json-doc lexer #1364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON rewrite #1029

JSON rewrite #1029

miparnisari commented Nov 3, 2018

miparnisari commented Nov 3, 2018

ffes commented Jan 24, 2019

pyrmont commented May 29, 2019

jneen commented May 29, 2019

pyrmont commented May 30, 2019

pyrmont commented May 30, 2019

jneen commented May 30, 2019

jneen commented May 30, 2019

miparnisari commented May 30, 2019

pyrmont commented May 30, 2019

miparnisari commented May 30, 2019

pyrmont commented May 30, 2019 •

edited

Loading

jneen commented May 31, 2019

pyrmont commented Jul 9, 2019

pyrmont commented Jul 10, 2019 •

edited

Loading

pyrmont commented Jul 11, 2019

pyrmont commented Jul 12, 2019

JSON rewrite #1029

JSON rewrite #1029

Conversation

miparnisari commented Nov 3, 2018

miparnisari commented Nov 3, 2018

ffes commented Jan 24, 2019

pyrmont commented May 29, 2019

jneen commented May 29, 2019

pyrmont commented May 30, 2019

pyrmont commented May 30, 2019

jneen commented May 30, 2019

jneen commented May 30, 2019

miparnisari commented May 30, 2019

pyrmont commented May 30, 2019

miparnisari commented May 30, 2019

pyrmont commented May 30, 2019 • edited Loading

jneen commented May 31, 2019

pyrmont commented Jul 9, 2019

pyrmont commented Jul 10, 2019 • edited Loading

pyrmont commented Jul 11, 2019

pyrmont commented Jul 12, 2019

pyrmont commented May 30, 2019 •

edited

Loading

pyrmont commented Jul 10, 2019 •

edited

Loading