Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made sure clear_cache also clears the cache of parsed uris #361

Merged
merged 1 commit into from
Oct 5, 2016

Conversation

iainbeeston
Copy link
Contributor

There are two main caches in json-schema. One is for schemas, one is for
parsed uris. Previously, setting :clear_cache as a validation option
would clear the schema cache but not the uri cache. This has been
reported to cause a memory leak (see issue #329) as over time the uri
cache gets bigger and bigger and is never cleared.

This change makes sure that whenever the schema cache is cleared, the
uri cache is also cleared.

Incidentally this change is not thread safe, and multithreaded
applications that clear the cache may see issues. But until we can
refactor to either disable caching or get rid of class variables,
I don't believe there's much we can (realistically) do about that. This
is still an improvement on what's gone before.

There are two main caches in json-schema. One is for schemas, one is for
parsed uris. Previously, setting `:clear_cache` as a validation option
would clear the schema cache but not the uri cache. This has been
reported to cause a memory leak (see issue voxpupuli#329) as over time the uri
cache gets bigger and bigger and is never cleared.

This change makes sure that whenever the schema cache is cleared, the
uri cache is also cleared.

Incidentally this change is not thread safe, and multithreaded
applications that clear the cache may see  issues. But until we can
refactor to either disable caching or get rid of class variables,
I don't believe there's much we can (realistically) do about that. This
is still an improvement on what's gone before.
@RST-J
Copy link
Contributor

RST-J commented Oct 4, 2016

My brain sometimes tickles me with the idea of a major restructuring of json-schema. Basically to extract functionality according to the separation of concerns principle. The validator base class does really, really much and I still have not the feeling to be at home there (which I consider a bad sign).
I have no clear idea whether and how to do that. I also haven't spent too much thoughts about it as this only makes sense, if we a) agree this is a good idea and b) then agree on how this then considered good idea should be carried out.

@iainbeeston
Copy link
Contributor Author

Yes I agree. I can see a way forward though by extracting out functionality one object at a time. There is clearly an object for URI parsing, and one for loading data from a file or uri. We should not be using class variables, but instead keep a set of objects internal to each validator object, and allow the user to reuse validator objects if they want to cache previously used data.

Another option that I have considered is extracting out some of the functionality into separate gems. For example, a gem could be made that just evaluates json references. Json parsing is already handled by the json gem. The json schema spec was split into three parts for draft 4 and these could possibly be represented by 3 gems? (Though I'm not sure if that would work?)

Regarding this PR, I think we should possibly have a single object that handles URI parsing, and (for now) keep an instance of that object in a class variable in Validator. That object could keep the URI caches internal to itself. This would be similar to the Reader objects we already have. Although I would argue that what is already in this pull request is good enough for now and we can improve the object design in a new pull request.

@iainbeeston
Copy link
Contributor Author

Oh, and another thing - I've been trying hard (with this and other pull requests) to improve the reliability and coverage of the test suite, so that if we do choose to refactor we can be more confident that nothing will break. Although we have a lot of tests some areas do not even have a single test that covers them! In other cases we have tests that only pass if they are run in the correct order!

@RST-J
Copy link
Contributor

RST-J commented Oct 4, 2016

Fine. I agree that we should take small steps - one aspect at a time. Actually I think this is the only realistic way to go. And I'm also in favour of improving the test suite itself and extending coverage along the way - wrt testing I'm rather the 'more is better' kind of guy, even more as this is a library.

What three parts are you referring to? I currently do not really have the spec (details) present in my mind - I'm probably kind of asked to much of by the 2nd paragraph in your first comment. ;)
In general I'd say we shouldn't make things too complex. Managing two dependencies to achieve one thing - validating json-schema is what users want to do - could introduce additional complexity. But if they make sense on their own, it could also be a good decision, although my gut feeling right now suggests otherwise.

And finally, yes, lets merge this. Its a small step.

@iainbeeston
Copy link
Contributor Author

Ah the three json schema drafts are the core, validation and hyper schema drafts. I suspect it was split up in draft4 because it was becoming too long...

@iainbeeston iainbeeston merged commit 7158532 into voxpupuli:master Oct 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants