-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inner_text for Nokogiri::HTML returns nil #264
Comments
JC said on the mailing list:
|
Thanks for filing this bug. I found a quite similar test case in test/html/test_document.rb but was commented out. This is why the bug is left. From git log, this test has been commented out since Aug 5 2009, probably, while implementing the very first version. As JC said, the test returns null even in the current version. I'll look into this bug. |
I fixed this bug in rev 28babb9 (java) and ece18b6 (java-merge). However, pure Java version couldn't eliminate blank nodes. So, you'll get "\nHello inner world!\n\n" as a result. I attempted to cut them out, but that broke other tests. Perhaps, noblanks option would be the best so far. html = Nokogiri::HTML(<<-eohtml) { |c| c.noblanks } Hello inner world! Would you (jc) test and report whether the bug is fixed or not? |
For what it's worth, the libxml2 behavior you're probably trying to emulate only removes blank text nodes when all text-node siblings are also blank. Another way to state this is that if there exists a non-blank sibling text-node, then the blank text node is not removed. |
Hmmm.... Is it just for inner_xml? I think blank nodes appear in a dom tree unless noblanks is specified. |
I have tried the new version, and while it fixes the bug in the test case, my main problem (when using Nokogiri with Webrat on Jruby on Windows) still persists. I have spent a few hours trying to tie it down, but I'd need a new testcase to really pinpoint it. Due to project pressures, I will have to delay that for the time being. Right now I work around the problem, by first asking for the xpath('//html') node of the response and then using inner_text on that node. Seems to work as far as I can tell. |
I pushed the fix in rev. 70713a0. The inner_text method won't return nil anymore. jcfischer's original example works on master. |
Thank you, @yokolet! Closing this issue. |
I have the following (new) testcase (in test/html/test_document.rb):
which fails, because inner_text returns nil.
I have tried to trace the problem into the JavaCode and it seems that the node attribute of the XmlNode.java class is not set, before the line "return parser.getDocument();" in XmlDomParserContext#do_parse is called.
However, my Java, XML, Xerces foo is not high enough to progress further...
cheers
jc
The text was updated successfully, but these errors were encountered: