-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up JSON::Validator.validate #255
Conversation
5daa621
to
1c86b40
Compare
Addressible's URI parsing shows up at the top of the profiler for our use case. This patch adds a cache to JSON::Util::URI.normalized_uri, to trade CPU time for memory. There are other expensive Addressible methods that should be cached elsewhere. benchmark: https://github.com/mjc/json-schema-perf MRI 2.2.2 before: 825.363 (± 6.2%) i/s MRI 2.2.2 after: 1.436k (± 4.5%) i/s 1.74x faster JRuby 1.7.20 before: 1.445k (± 2.5%) i/s JRuby 1.7.20 after: 2.272k (± 6.1%) i/s 1.57x faster
I also need to research finding ways to test concurrency and remove the mutex around schema initialization safely. |
|
||
def self.normalized_uri(uri) | ||
instance.cache ||= {} | ||
instance.cache[uri] ||= instance.normalized_uri(uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching here is totally unsafe in a multithreaded environment. You're going to end up with race conditions. But the side effect of that is (I admit) fairly minor here because the caller will always get the correct result.
However, I've had a long think about whether caching is the right approach (since you raised this pull request) and I think you're probably right to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iainbeeston right, the only effect of the race condition is a cache miss, and that's OK, given that it's a standard effect of most caching. I can add a comment noting that if desired.
With regards to the mutex, I'm not even sure what it's purpose is. So far as I can tell it ensures that only one validator can be initialised at once. But it's not protecting access to any class variables, and it's not protecting a validator from being used by several threads concurrently and getting into an invalid state. It's possible the answer is in the git history, though I'm not in front of a computer right now. @hoxworth Do you remember why the mutex was introduced? |
Addressable::URI#fragment= triggers a revalidation of the URL, which is super slow. Don't do this when it isn't necessary.
This reverts commit ac24f48.
I'm pretty much done with this pass if we can't remove the mutex. |
To me that looks OK and I guess using some more memory is a reasonable trade for increased performance. |
# a cache miss | ||
def self.normalized_uri(uri) | ||
instance.normalize_cache ||= {} | ||
instance.normalize_cache[uri] ||= instance.normalized_uri(uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last question - should the cache key here be "uri.to_s"? (As it is for the parse cache?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
@iainbeeston no problem! Happy to help. Trying to improve performance if you don't know where the bottlenecks are and whether you made a useful difference is silly. |
Weird, I think maybe the .parse cache just has to go. :( |
So it works when the key is a uri object but not when it is a string? |
It looks like the objects get modified in incompatible ways sometimes? The On Thu, Jun 4, 2015 at 3:19 PM Iain Beeston notifications@github.com
|
Do you think it'd be possible to use the stdlib's URI stuff?
|
Is it possible to freeze the URI object? Would that help? The gem used to do that but although it's faster it has trouble with
|
I'm closing this for now. I've merged most of the changes with the exception of using has to generate uuids (which gave a good speed improvement but feels risky to me) |
Addressible's URI parsing shows up at the top of the profiler for our use case on JRuby, as well as YAML dumping.
This patch
JSON::Util::URI.normalized_uri
Addressable::URI.parse
inJSON::Util::URI.parse
There are many further gains to be had still, especially if we can find a way to remove the mutex around initializing a schema, as that is proving to be terrible for performance on JRuby in a multi-threaded environment.
I'm pretty much finished with this pass - almost 8x is a pretty nice gain.
benchmark: https://github.com/mjc/json-schema-perf
MRI 2.2.2 results: 2.89x faster from file, 5.55x faster from hash.
JRuby 1.7.20 results: 2.38x faster from file, 7.86x faster from hash.
MRI 2.2.2 before
MRI 2.2.2 after
JRuby 1.7.20 before
JRuby 1.7.20 after