Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean all occurrences of _private on GC when libxml-ruby is loaded. #895

Closed
wants to merge 1 commit into from

Conversation

ender672
Copy link
Member

libxml-ruby uses the global libxml2 callback xmlDeregisterNodeDefault.

This callback looks in the _private field for every deleted libxml2
node. If the _private field is set, it is treated as a Ruby VALUE ptr.
The callback NULLs the Ruby object's data pointer and mark & free fn
pointers.

When Nokogiri wraps a libxml2 document it puts a nokogiriTuple ptr
in the _private field. We can't let the libxml-ruby callback be
invoked on a libxml2 document before NULLing the document _private
field.

When Nokogiri wraps other libxml2 nodes it puts VALUE ptrs in the
_private field. This makes the libxml-ruby callback generally safe for
these node types.

There is a caveat - it is unsafe to access VALUE ptrs during GC sweep.

This commit tests whether the xmlDeregisterNodeDefault callback is set
during document cleanup. If set, we traverse the document and clean up
all _private fields.

The previous solution was to unset xmlDeregisterNodeDefault callback.
However, this doesn't work in multithreaded environments

Fixes issues with loading Nokogiri + libxml-ruby in multithreaded environments.

See bug #881

libxml-ruby uses the global libxml2 callback xmlDeregisterNodeDefault.

This callback looks in the _private field for every deleted libxml2
node. If the _private field is set, it is treated as a Ruby VALUE ptr.
The callback NULLs the Ruby object's data pointer and mark & free fn
pointers.

When Nokogiri wraps a libxml2 document it puts a nokogiriTuple ptr
in the _private field. We can't let the libxml-ruby callback be
invoked on a libxml2 document before NULLing the document _private
field.

When Nokogiri wraps other libxml2 nodes it puts VALUE ptrs in the
_private field. This makes the libxml-ruby callback generally safe for
these node types.

There is a caveat - it is unsafe to access VALUE ptrs during GC sweep.

This commit tests whether the xmlDeregisterNodeDefault callback is set
during document cleanup. If set, we traverse the document and clean up
all _private fields.

The previous solution was to unset xmlDeregisterNodeDefault callback.
However, this doesn't work in multithreaded environments.
@tenderlove
Copy link
Member

@ender672 @flavorjones should we merge this? I find it annoying to add hacks around libxml-ruby. Maybe we should try fixing their code?

@ender672
Copy link
Member Author

I suspect libxml-ruby suffers from the same GC-related segfault even when Nokogiri is not loaded.

I don't see an obvious or easy fix. The Nokogiri approach (hold off on freeing anything until document GC) is one option. I've been mulling on this. Who knows, maybe there is an option that improves on both?

This pull request is a good workaround for the short term.

@kookster
Copy link

We're definitely seeing this kind of segfault. I just merged this to our https://github.com/PRX/nokogiri fork, and we're gonna exercise it and see what we can see. Is there any concern that by folding off GC will cause a memory leak?

@ender672
Copy link
Member Author

Is there any concern that by folding off GC will cause a memory leak?

This commit shouldn't cause a memory leak. It doesn't disable libxml-ruby's callback and it only applies to Nokogiri wrapped objects.

The concern is that the bug is probably not a Nokogiri issue and we're hesitant to maintain a fix in Nokogiri for a libxml-ruby bug.

This bug only triggers when you have ruby 1.9.x or later, multiple threads, Nokogiri, and libxml-ruby loaded. It is hard to reproduce, and likely rare in production. @kookster, can you post a little more detail on your segfault?

@kookster
Copy link

We have gotten what seems to be the same segfault on ruby 1.9.3 and 2.0.0, using nokogiri inside sidekiq running 25 concurrent workers (we have also seen with fewer workers). Sometimes it runs through a bunch of jobs fine, sometimes it segfaults. Here is the log of a segfault:

https://gist.github.com/kookster/5578695

@kookster
Copy link

BTW - we are using libxml-ruby, as graticule -> happy-mapper -> libxml-ruby
I am right now rewriting to use geocoder instead just to get rid of it ;)

@ender672
Copy link
Member Author

Sounds like your app hits all factors needed to trigger this bug.

I'm interested to see if this pull request makes it go away.

@cantino
Copy link

cantino commented Jul 31, 2013

Hopefully xml4r/libxml-ruby#62 will be pushed and this won't be needed.

@ender672
Copy link
Member Author

ender672 commented Aug 1, 2013

@cantino - the fix for that issue unfortunately does not address the libxml-ruby + nokogiri problem.

@JakeAustwick
Copy link

Any news on whether this is going to get merged?

@ghost
Copy link

ghost commented Apr 22, 2014

When will this be merged?

@flavorjones
Copy link
Member

I think I'm OK with merging this, given the full context. Let me run valgrind against it and if it's clean, I'll merge. Thanks, @ender672 .

@flavorjones
Copy link
Member

Merged. Thanks!

@flavorjones flavorjones deleted the libxml-ruby-multithreaded-compatible branch May 22, 2014 16:18
@flavorjones
Copy link
Member

Will be in 1.6.3.rc1 to be released later today

@flavorjones
Copy link
Member

see related #1426

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants