-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean all occurrences of _private on GC when libxml-ruby is loaded. #895
Conversation
libxml-ruby uses the global libxml2 callback xmlDeregisterNodeDefault. This callback looks in the _private field for every deleted libxml2 node. If the _private field is set, it is treated as a Ruby VALUE ptr. The callback NULLs the Ruby object's data pointer and mark & free fn pointers. When Nokogiri wraps a libxml2 document it puts a nokogiriTuple ptr in the _private field. We can't let the libxml-ruby callback be invoked on a libxml2 document before NULLing the document _private field. When Nokogiri wraps other libxml2 nodes it puts VALUE ptrs in the _private field. This makes the libxml-ruby callback generally safe for these node types. There is a caveat - it is unsafe to access VALUE ptrs during GC sweep. This commit tests whether the xmlDeregisterNodeDefault callback is set during document cleanup. If set, we traverse the document and clean up all _private fields. The previous solution was to unset xmlDeregisterNodeDefault callback. However, this doesn't work in multithreaded environments.
@ender672 @flavorjones should we merge this? I find it annoying to add hacks around libxml-ruby. Maybe we should try fixing their code? |
I suspect libxml-ruby suffers from the same GC-related segfault even when Nokogiri is not loaded. I don't see an obvious or easy fix. The Nokogiri approach (hold off on freeing anything until document GC) is one option. I've been mulling on this. Who knows, maybe there is an option that improves on both? This pull request is a good workaround for the short term. |
We're definitely seeing this kind of segfault. I just merged this to our https://github.com/PRX/nokogiri fork, and we're gonna exercise it and see what we can see. Is there any concern that by folding off GC will cause a memory leak? |
This commit shouldn't cause a memory leak. It doesn't disable libxml-ruby's callback and it only applies to Nokogiri wrapped objects. The concern is that the bug is probably not a Nokogiri issue and we're hesitant to maintain a fix in Nokogiri for a libxml-ruby bug. This bug only triggers when you have ruby 1.9.x or later, multiple threads, Nokogiri, and libxml-ruby loaded. It is hard to reproduce, and likely rare in production. @kookster, can you post a little more detail on your segfault? |
We have gotten what seems to be the same segfault on ruby 1.9.3 and 2.0.0, using nokogiri inside sidekiq running 25 concurrent workers (we have also seen with fewer workers). Sometimes it runs through a bunch of jobs fine, sometimes it segfaults. Here is the log of a segfault: |
BTW - we are using libxml-ruby, as graticule -> happy-mapper -> libxml-ruby |
Sounds like your app hits all factors needed to trigger this bug. I'm interested to see if this pull request makes it go away. |
Hopefully xml4r/libxml-ruby#62 will be pushed and this won't be needed. |
@cantino - the fix for that issue unfortunately does not address the libxml-ruby + nokogiri problem. |
Any news on whether this is going to get merged? |
When will this be merged? |
I think I'm OK with merging this, given the full context. Let me run valgrind against it and if it's clean, I'll merge. Thanks, @ender672 . |
Merged. Thanks! |
Will be in 1.6.3.rc1 to be released later today |
see related #1426 |
libxml-ruby uses the global libxml2 callback xmlDeregisterNodeDefault.
This callback looks in the _private field for every deleted libxml2
node. If the _private field is set, it is treated as a Ruby VALUE ptr.
The callback NULLs the Ruby object's data pointer and mark & free fn
pointers.
When Nokogiri wraps a libxml2 document it puts a nokogiriTuple ptr
in the _private field. We can't let the libxml-ruby callback be
invoked on a libxml2 document before NULLing the document _private
field.
When Nokogiri wraps other libxml2 nodes it puts VALUE ptrs in the
_private field. This makes the libxml-ruby callback generally safe for
these node types.
There is a caveat - it is unsafe to access VALUE ptrs during GC sweep.
This commit tests whether the xmlDeregisterNodeDefault callback is set
during document cleanup. If set, we traverse the document and clean up
all _private fields.
The previous solution was to unset xmlDeregisterNodeDefault callback.
However, this doesn't work in multithreaded environments
Fixes issues with loading Nokogiri + libxml-ruby in multithreaded environments.
See bug #881