Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning for RegexLexer and HTML formatter #114

Merged
merged 1 commit into from
Nov 21, 2013

Conversation

korny
Copy link
Contributor

@korny korny commented Nov 21, 2013

TL;DR

Merge it to make Rouge 2x as fast.

Story

This branch is called "hiddensee" because that's where I am right now, on holiday ;-)

Continuing my tinkering with the internals of CodeRay and Rouge, I dug deeper into the DSL-/rule-based lexer system, comparing it to CodeRay's hand-written approach to find out where they differ and what's standing in the way of better performance. It took a few days, but I think I have a set of patches ready that make Rouge much faster than current master (especially faster than Pygments.rb) without changing any of the lexer's definitions.

I tried to reduce the changes to a minimum, and while some of them may have a relatively minor impact on performance, all of them together give Rouge a remarkable speed boost, running at least twice as fast in my benchmarks.

Benchmarks

The tests include example inputs in C, CSS, HTML, JavaScript, JSON, Perl, and Ruby.

hiddensee-shootout

Depending on input size, Pygments.rb is mostly faster than Rouge right now for files larger than 1kB. Below that, the additional overhead of switching to Python seems to slow it down.

With these patches, Rouge can match and surpass Pygments' speed, even for large files.

If you have Numbers, you can download the detailed result tables that I used to create the graph. All benchmarks were done with Ruby 2.0.0p247 on the Shootout v1.6.

Changes

Here's an overview of the changes, roughly ordered by the impact they have on performance:

Lexer / RegexLexer

  • Unroll deeply nested calls during rule recognition and rule execution. Especially:
    • Remove #with_output_stream. I found no reason for the use of Enumerator::Yielder; the whole thing with the nested callbacks seemed pretty convoluted, but perhaps I'm missing something?
    • Remove the block parameter from #step, because the value is static. Instead, @output_stream is set by #stream_tokens at the beginning.
    • The three main actions (token, state push and :pop!) are no longer using DRY method calls, but are optimized to manipulate the stack and output stream directly. This allows for several other optimizations, like knowing that the state is already a Symbol when pushing. They also make use of the given stream parameter.
    • ::get_state uses Symbols instead of Strings as keys for the states hash now, because we can directly try @states[next_state] when pushing.
    • Inline #run_rule and #run_callback into the #step method to reduce the number of method calls considerably, and use local variables as far as possible. This allows using next instead of return.
  • Using if @debug as a guard for debug statements, because evaluating an instance variable is much faster than calling a method with a block. This eliminates the need for the hack that redefines the method when it's called for the first time, but makes debug { … } lines a bit more ugly.
  • Optimize the hotspot, #step:
    • Use StringScanner#skip instead of StringScanner#scan, because it's faster, and the size of the match is more useful at this point than the match itself.
    • Check @null_steps only after incrementing it.
    • Use if instead of case in #step, and rely on rule being a Rule when it's not a State.
    • Make Rule#beginning_of_line an attribute instead of a method.
  • Define @states as a cache for self.class.states.
  • Define @null_steps beforehand because we need it anyway.
  • Set the default value for the second parameter of #token directly in the signature. Using a magic value like :__absent__ seems strange to me.
  • Token#name, #parent, and #shortname are now attributes instead of methods.

HTML formatter

  • Replace CGI.escape_html (which is quite slow) with a simplified version that only escapes &, <, and >.
    • In my opinion there's no need to escape ", ', and / outside of argument lists and script tags.
    • I left your TODO in because I'm not sure what it was about.
  • Optimize the hotspot, #span:
    • Combine yield calls as much as possible.
    • Don't use inspect to wrap the token shortname in quotes. Its definition could change in the future, and it's faster to add the quotes to the wrapping strings.
    • Call tok.shortname only once.
    • Use if instead of case.
  • Remove block parameter in #stream_tableized and span (it already uses yield instead).

Open issues

There are more things that I wanted to optimize:

  • The @group_count = 0 line in #step looks awfully wasteful, but I couldn't find a better way.
  • Do we really need the @null_steps check? It seems to be a good tool for debugging, but the Lexers shouldn't have such bugs in the first place. Silently failing here is also a bit weird…it would be great to get rid of this construction.

Cheers!

@korny
Copy link
Contributor Author

korny commented Nov 21, 2013

Pulling @siong1987, @robin850, and @mattr- into the discussion. Referencing jekyll/jekyll#930 and #41.

@korny
Copy link
Contributor Author

korny commented Nov 21, 2013

Another graph, this time comparing a typical file size of 1kB in different languages:

hiddensee-1kb

@mattr-
Copy link

mattr- commented Nov 21, 2013

Based on the benchmarks provided here, having this merged would get me very excited about including this in Jekyll.

@mattr-
Copy link

mattr- commented Nov 21, 2013

Also, @korny, this is nice work. Thanks for the detailed description and the benchmarks.

@jneen jneen merged commit cf83b49 into rouge-ruby:master Nov 21, 2013
@jneen
Copy link
Member

jneen commented Nov 21, 2013

Brilliant, merged. I'm okay with regex_lexer.rb being a bit ugly for the sake of performance, as long as it's still easy to write a lexer :).

@jneen
Copy link
Member

jneen commented Nov 21, 2013

I'd like to keep the @null_steps check unless there's a really pressing need to get rid of it. Ideally lexers would be bug-free, but that check ensures that the failure mode is "you get error tokens in the output" rather than "your server crashes because rouge infinite-looped". I'd be okay with it raising an error instead, for example, but I think the safety check is still necessary.

@jneen
Copy link
Member

jneen commented Nov 21, 2013

I killed RegexLexer#group (and hence the @group_count assignment) in c95a5d3 and d469551.

@robin850
Copy link
Contributor

Awesome work @korny! ❤️

@jneen
Copy link
Member

jneen commented Nov 21, 2013

I've got one or two lexers in the pipeline before I release (OCaml and SML in particular), and then I'll release 1.2.0.

@siong1987
Copy link

@korny 👍

@jneen
Copy link
Member

jneen commented Nov 26, 2013

This is released in v1.2.0.

lucidbee pushed a commit to BonsaiAI/rouge that referenced this pull request Mar 16, 2017
@pyrmont pyrmont mentioned this pull request May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants