-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made sure clear_cache also clears the cache of parsed uris #361
Conversation
ec65e2e
to
30d3a05
Compare
There are two main caches in json-schema. One is for schemas, one is for parsed uris. Previously, setting `:clear_cache` as a validation option would clear the schema cache but not the uri cache. This has been reported to cause a memory leak (see issue voxpupuli#329) as over time the uri cache gets bigger and bigger and is never cleared. This change makes sure that whenever the schema cache is cleared, the uri cache is also cleared. Incidentally this change is not thread safe, and multithreaded applications that clear the cache may see issues. But until we can refactor to either disable caching or get rid of class variables, I don't believe there's much we can (realistically) do about that. This is still an improvement on what's gone before.
30d3a05
to
334edf0
Compare
My brain sometimes tickles me with the idea of a major restructuring of json-schema. Basically to extract functionality according to the separation of concerns principle. The validator base class does really, really much and I still have not the feeling to be at home there (which I consider a bad sign). |
Yes I agree. I can see a way forward though by extracting out functionality one object at a time. There is clearly an object for URI parsing, and one for loading data from a file or uri. We should not be using class variables, but instead keep a set of objects internal to each validator object, and allow the user to reuse validator objects if they want to cache previously used data. Another option that I have considered is extracting out some of the functionality into separate gems. For example, a gem could be made that just evaluates json references. Json parsing is already handled by the json gem. The json schema spec was split into three parts for draft 4 and these could possibly be represented by 3 gems? (Though I'm not sure if that would work?) Regarding this PR, I think we should possibly have a single object that handles URI parsing, and (for now) keep an instance of that object in a class variable in Validator. That object could keep the URI caches internal to itself. This would be similar to the Reader objects we already have. Although I would argue that what is already in this pull request is good enough for now and we can improve the object design in a new pull request. |
Oh, and another thing - I've been trying hard (with this and other pull requests) to improve the reliability and coverage of the test suite, so that if we do choose to refactor we can be more confident that nothing will break. Although we have a lot of tests some areas do not even have a single test that covers them! In other cases we have tests that only pass if they are run in the correct order! |
Fine. I agree that we should take small steps - one aspect at a time. Actually I think this is the only realistic way to go. And I'm also in favour of improving the test suite itself and extending coverage along the way - wrt testing I'm rather the 'more is better' kind of guy, even more as this is a library. What three parts are you referring to? I currently do not really have the spec (details) present in my mind - I'm probably kind of asked to much of by the 2nd paragraph in your first comment. ;) And finally, yes, lets merge this. Its a small step. |
Ah the three json schema drafts are the core, validation and hyper schema drafts. I suspect it was split up in draft4 because it was becoming too long... |
There are two main caches in json-schema. One is for schemas, one is for
parsed uris. Previously, setting
:clear_cache
as a validation optionwould clear the schema cache but not the uri cache. This has been
reported to cause a memory leak (see issue #329) as over time the uri
cache gets bigger and bigger and is never cleared.
This change makes sure that whenever the schema cache is cleared, the
uri cache is also cleared.
Incidentally this change is not thread safe, and multithreaded
applications that clear the cache may see issues. But until we can
refactor to either disable caching or get rid of class variables,
I don't believe there's much we can (realistically) do about that. This
is still an improvement on what's gone before.