Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rb_utf8_str_new/rb_utf8_str_new_cstr to create UTF8 string #950

Merged
merged 1 commit into from
Jan 5, 2025

Conversation

Watson1978
Copy link
Collaborator

@Watson1978 Watson1978 commented Jan 4, 2025

This patch will use rb_utf8_str_new/rb_utf8_str_new_cstr API to create UTF8 string.
Seems it has slightly better performance.

before after result
Oj.load 654.004 670.792 1.025x

Environment

  • Linux
    • Manjaro Linux x86_64
    • Kernel: 6.12.4-1-MANJARO
    • AMD Ryzen 9 8945HS
    • gcc version 14.2.1
    • Ruby 3.4.1

Code

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'oj'
end

# https://github.com/miloyip/nativejson-benchmark/blob/master/data/twitter.json
json = File.read('twitter.json')

Benchmark.ips do |x|
  x.time = 10
  x.report('Oj.load compat') { Oj.load(json, mode: :compat) }
end

Before

$ ruby json_load.rb
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
      Oj.load compat    64.000 i/100ms
Calculating -------------------------------------
      Oj.load compat    654.004 (± 1.7%) i/s    (1.53 ms/i) -      6.592k in  10.082170s

After

$ ruby json_load.rb
Warming up --------------------------------------
      Oj.load compat    65.000 i/100ms
Calculating -------------------------------------
      Oj.load compat    670.792 (± 1.6%) i/s    (1.49 ms/i) -      6.760k in  10.080319s

This patch will use rb_utf8_str_new/rb_utf8_str_new_cstr API to create UTF8 string.
Seems it has slightly better performance.

−       | before  | after   | result
--      | --      | --      | --
Oj.load | 654.004 | 670.792 | 1.025x

### Environment
- Linux
  - Manjaro Linux x86_64
  - Kernel: 6.12.4-1-MANJARO
  - AMD Ryzen 9 8945HS
  - gcc version 14.2.1
  - Ruby 3.4.1

### Code
```ruby
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'oj'
end

# https://github.com/miloyip/nativejson-benchmark/blob/master/data/twitter.json
json = File.read('twitter.json')

Benchmark.ips do |x|
  x.time = 10
  x.report('Oj.load compat') { Oj.load(json, mode: :compat) }
end
```

### Before
```
$ ruby json_load.rb
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
      Oj.load compat    64.000 i/100ms
Calculating -------------------------------------
      Oj.load compat    654.004 (± 1.7%) i/s    (1.53 ms/i) -      6.592k in  10.082170s
```

### After
```
$ ruby json_load.rb
Warming up --------------------------------------
      Oj.load compat    65.000 i/100ms
Calculating -------------------------------------
      Oj.load compat    670.792 (± 1.6%) i/s    (1.49 ms/i) -      6.760k in  10.080319s
```
volatile VALUE rs = rb_str_new(sw->sw.out.buf, size);

// Oddly enough, when pushing ASCII characters with UTF-8 encoding or
// even ASCII-8BIT does not change the output encoding. Pushing any
// non-ASCII no matter what the encoding changes the output encoding
// to ASCII-8BIT if it the string is not forced to UTF-8 here.
rs = oj_encode(rs);
volatile VALUE rs = rb_utf8_str_new(sw->sw.out.buf, size);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be no problem so that the tests was passed added 73595e6

@Watson1978
Copy link
Collaborator Author

Hmm, It's still lagging behind json gem...

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'oj'
  gem 'json'
end

# https://github.com/miloyip/nativejson-benchmark/blob/master/data/twitter.json
json = File.read('twitter.json')

Benchmark.ips do |x|
  x.time = 10
  x.report('JSON.parse') { JSON.parse(json) }
  x.report('Oj.load compat') { Oj.load(json, mode: :compat) }
  x.compare!
end
Warming up --------------------------------------
          JSON.parse    70.000 i/100ms
      Oj.load compat    65.000 i/100ms
Calculating -------------------------------------
          JSON.parse    739.810 (± 3.1%) i/s    (1.35 ms/i) -      7.420k in  10.040792s
      Oj.load compat    655.715 (± 2.9%) i/s    (1.53 ms/i) -      6.565k in  10.021079s

Comparison:
          JSON.parse:      739.8 i/s
      Oj.load compat:      655.7 i/s - 1.13x  slower

@ohler55 ohler55 merged commit c798312 into ohler55:develop Jan 5, 2025
54 checks passed
@ohler55
Copy link
Owner

ohler55 commented Jan 5, 2025

The biggest performance bottleneck is the call to the objects to_json or as_json methods. I'm not sure what can be done to avoid those calls.

@Watson1978 Watson1978 deleted the rb_utf8_str_new branch January 5, 2025 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants