-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tuning for RegexLexer and HTML formatter #114
Conversation
Pulling @siong1987, @robin850, and @mattr- into the discussion. Referencing jekyll/jekyll#930 and #41. |
Based on the benchmarks provided here, having this merged would get me very excited about including this in Jekyll. |
Also, @korny, this is nice work. Thanks for the detailed description and the benchmarks. |
Brilliant, merged. I'm okay with |
I'd like to keep the |
Awesome work @korny! ❤️ |
I've got one or two lexers in the pipeline before I release (OCaml and SML in particular), and then I'll release |
@korny 👍 |
This is released in v1.2.0. |
…uby#114 Thanks to @kevin-buttercoin for this fix!
TL;DR
Merge it to make Rouge 2x as fast.
Story
This branch is called "hiddensee" because that's where I am right now, on holiday ;-)
Continuing my tinkering with the internals of CodeRay and Rouge, I dug deeper into the DSL-/rule-based lexer system, comparing it to CodeRay's hand-written approach to find out where they differ and what's standing in the way of better performance. It took a few days, but I think I have a set of patches ready that make Rouge much faster than current master (especially faster than Pygments.rb) without changing any of the lexer's definitions.
I tried to reduce the changes to a minimum, and while some of them may have a relatively minor impact on performance, all of them together give Rouge a remarkable speed boost, running at least twice as fast in my benchmarks.
Benchmarks
The tests include example inputs in C, CSS, HTML, JavaScript, JSON, Perl, and Ruby.
Depending on input size, Pygments.rb is mostly faster than Rouge right now for files larger than 1kB. Below that, the additional overhead of switching to Python seems to slow it down.
With these patches, Rouge can match and surpass Pygments' speed, even for large files.
If you have Numbers, you can download the detailed result tables that I used to create the graph. All benchmarks were done with Ruby 2.0.0p247 on the Shootout v1.6.
Changes
Here's an overview of the changes, roughly ordered by the impact they have on performance:
Lexer / RegexLexer
#with_output_stream
. I found no reason for the use ofEnumerator::Yielder
; the whole thing with the nested callbacks seemed pretty convoluted, but perhaps I'm missing something?#step
, because the value is static. Instead,@output_stream
is set by#stream_tokens
at the beginning.:pop!
) are no longer using DRY method calls, but are optimized to manipulate the stack and output stream directly. This allows for several other optimizations, like knowing that the state is already a Symbol when pushing. They also make use of the givenstream
parameter.::get_state
uses Symbols instead of Strings as keys for thestates
hash now, because we can directly try@states[next_state]
when pushing.#run_rule
and#run_callback
into the#step
method to reduce the number of method calls considerably, and use local variables as far as possible. This allows usingnext
instead ofreturn
.if @debug
as a guard for debug statements, because evaluating an instance variable is much faster than calling a method with a block. This eliminates the need for the hack that redefines the method when it's called for the first time, but makesdebug { … }
lines a bit more ugly.#step
:StringScanner#skip
instead ofStringScanner#scan
, because it's faster, and the size of the match is more useful at this point than the match itself.@null_steps
only after incrementing it.if
instead ofcase
in#step
, and rely onrule
being aRule
when it's not aState
.Rule#beginning_of_line
an attribute instead of a method.@states
as a cache forself.class.states
.@null_steps
beforehand because we need it anyway.#token
directly in the signature. Using a magic value like:__absent__
seems strange to me.Token#name
,#parent
, and#shortname
are now attributes instead of methods.HTML formatter
CGI.escape_html
(which is quite slow) with a simplified version that only escapes&
,<
, and>
."
,'
, and/
outside of argument lists and script tags.#span
:yield
calls as much as possible.inspect
to wrap the token shortname in quotes. Its definition could change in the future, and it's faster to add the quotes to the wrapping strings.tok.shortname
only once.if
instead ofcase
.#stream_tableized
andspan
(it already usesyield
instead).Open issues
There are more things that I wanted to optimize:
@group_count = 0
line in#step
looks awfully wasteful, but I couldn't find a better way.@null_steps
check? It seems to be a good tool for debugging, but the Lexers shouldn't have such bugs in the first place. Silently failing here is also a bit weird…it would be great to get rid of this construction.Cheers!