Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating a new array of strings simply to be checked against #1141

Merged
merged 2 commits into from
Jun 1, 2019

Conversation

ashmaroli
Copy link
Contributor

@ashmaroli ashmaroli commented May 31, 2019

Instead, stash the array in a class constant or simply use the double-pipe (||) operator.

@jneen
Copy link
Member

jneen commented May 31, 2019

Does this have a marked performance improvement? I believe we only call this method once per lex.

@ashmaroli
Copy link
Contributor Author

Does this have a marked performance improvement?

Not at all. Just reductions in array allocations.

I believe we only call this method once per lex.

I checked this by running against Jekyll's docs source-code. The Lexer.assert_utf8! was called a total of 5196 times

@pyrmont pyrmont added the needs-review The PR needs to be reviewed label May 31, 2019
lib/rouge/lexer.rb Outdated Show resolved Hide resolved
@pyrmont pyrmont added author-action The PR has been reviewed but action by the author is needed maintainer-action The PR has been reviewed but action by a maintainer is required and removed needs-review The PR needs to be reviewed author-action The PR has been reviewed but action by the author is needed labels May 31, 2019
@jneen
Copy link
Member

jneen commented Jun 1, 2019

Wait, hang on. Why is this method being called so often?

@jneen
Copy link
Member

jneen commented Jun 1, 2019

Also, we don't really need an array, so if the allocation is really where the problem is (and not that this method is being called upwards of 5000 times for some reason) I'd prefer to just use ||.

@ashmaroli
Copy link
Contributor Author

Wait, hang on. Why is this method being called so often?

@jneen Because this method is part of two frequently called methods (based on how Jekyll and kramdown is using Rouge):

rouge/lib/rouge/lexer.rb

Lines 403 to 406 in df23679

def lex(string, opts={}, &b)
return enum_for(:lex, string, opts) unless block_given?
Lexer.assert_utf8!(string)

def filter(lexers)
# don't bother reading the input if
# we've already filtered to 1
return lexers if lexers.size == 1
source_text = get_source(@source)
Lexer.assert_utf8!(source_text)

I'd prefer to just use ||

Indeed, my issue here is to eliminate as much unnecessary allocations (and retained objects) as possible. (Irrespective of whether they have a significant impact on the performance (speed)). To that end using just strings and an operator call is all the more welcome.

@ashmaroli ashmaroli changed the title Use a constant list of encodings to check against Avoid creating a new array of strings simply to be checked against Jun 1, 2019
@pyrmont pyrmont merged commit a3d4091 into rouge-ruby:master Jun 1, 2019
@pyrmont pyrmont removed the maintainer-action The PR has been reviewed but action by a maintainer is required label Jun 1, 2019
@jneen
Copy link
Member

jneen commented Jun 1, 2019

Can you tell me more about that pattern of calling? I want to make sure we're optimizing for the right stuff.

@ashmaroli
Copy link
Contributor Author

Can you tell me more about that pattern of calling?

@jneen Jekyll has a custom Liquid tag that can be used to render code-blocks with syntax-highlighting in a Jekyll site's templates and themes.
This is how the tag uses Rouge:

      def render_rouge(code)
        require "rouge"
        formatter = ::Rouge::Formatters::HTMLLegacy.new(
          :line_numbers => @highlight_options[:linenos],
          :wrap         => false,
          :css_class    => "highlight",
          :gutter_class => "gutter",
          :code_class   => "code"
        )
        lexer = ::Rouge::Lexer.find_fancy(@lang, code) || Rouge::Lexers::PlainText
        formatter.format(lexer.lex(code))
      end

This method will be called *for every use of {% highlight %} some code {% endhighlight %} in a site configured to use Rouge as the site's syntax-highlighter.
The line of interest here is the very last expression:

formatter.format(lexer.lex(code))

A similar code exists in kramdown that is used to render fenced code-blocks in Markdown:

    def self.call(converter, text, lang, type, call_opts)
      opts = options(converter, type)
      call_opts[:default_lang] = opts[:default_lang]
      return nil unless lang || opts[:default_lang] || opts[:guess_lang]

      lexer = ::Rouge::Lexer.find_fancy(lang || opts[:default_lang], text)
      return nil if opts[:disable] || !lexer || (lexer.tag == "plaintext" && !opts[:guess_lang])

      opts[:css_class] ||= 'highlight' # For backward compatibility when using Rouge 2.0
      formatter = formatter_class(opts).new(opts)
      formatter.format(lexer.lex(text))
    end

@ashmaroli ashmaroli deleted the valid-encodings-list branch June 2, 2019 06:32
@jneen
Copy link
Member

jneen commented Jun 2, 2019

Makes sense, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants