Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inner_text for Nokogiri::HTML returns nil #264

Closed
jcfischer opened this issue Apr 27, 2010 · 8 comments
Closed

inner_text for Nokogiri::HTML returns nil #264

jcfischer opened this issue Apr 27, 2010 · 8 comments

Comments

@jcfischer
Copy link

I have the following (new) testcase (in test/html/test_document.rb):

def test_html_inner_text
  html = Nokogiri::HTML(<<-eohtml)
<html>
    <body><div><p>Hello inner world!</p></div></body>
</html>
eohtml

    assert_equal('Hello inner world!", html.inner_text)
end

which fails, because inner_text returns nil.

I have tried to trace the problem into the JavaCode and it seems that the node attribute of the XmlNode.java class is not set, before the line "return parser.getDocument();" in XmlDomParserContext#do_parse is called.

However, my Java, XML, Xerces foo is not high enough to progress further...

cheers
jc

@flavorjones
Copy link
Member

JC said on the mailing list:

We are hung up on this problem, so I'd like to try to tackle it.
Here's what I've found out:

I have tried to trace the problem into the JavaCode and it seems that
the node attribute of the XmlNode.java class is not set, before the
line "return parser.getDocument();" in XmlDomParserContext#do_parse is
called.

It this the right direction? If so - who is responsible for setting
the node attribute?

cheers
jc

@yokolet
Copy link
Member

yokolet commented Apr 29, 2010

Thanks for filing this bug.

I found a quite similar test case in test/html/test_document.rb but was commented out. This is why the bug is left. From git log, this test has been commented out since Aug 5 2009, probably, while implementing the very first version.

As JC said, the test returns null even in the current version. I'll look into this bug.

@yokolet
Copy link
Member

yokolet commented May 7, 2010

I fixed this bug in rev 28babb9 (java) and ece18b6 (java-merge). However, pure Java version couldn't eliminate blank nodes. So, you'll get "\nHello inner world!\n\n" as a result. I attempted to cut them out, but that broke other tests. Perhaps, noblanks option would be the best so far.

html = Nokogiri::HTML(<<-eohtml) { |c| c.noblanks }

Hello inner world!

eohtml

Would you (jc) test and report whether the bug is fixed or not?

@flavorjones
Copy link
Member

For what it's worth, the libxml2 behavior you're probably trying to emulate only removes blank text nodes when all text-node siblings are also blank. Another way to state this is that if there exists a non-blank sibling text-node, then the blank text node is not removed.

@yokolet
Copy link
Member

yokolet commented May 10, 2010

Hmmm.... Is it just for inner_xml? I think blank nodes appear in a dom tree unless noblanks is specified.

@jcfischer
Copy link
Author

I have tried the new version, and while it fixes the bug in the test case, my main problem (when using Nokogiri with Webrat on Jruby on Windows) still persists. I have spent a few hours trying to tie it down, but I'd need a new testcase to really pinpoint it. Due to project pressures, I will have to delay that for the time being.

Right now I work around the problem, by first asking for the xpath('//html') node of the response and then using inner_text on that node. Seems to work as far as I can tell.

@yokolet
Copy link
Member

yokolet commented Nov 19, 2010

I pushed the fix in rev. 70713a0. The inner_text method won't return nil anymore. jcfischer's original example works on master.

@flavorjones
Copy link
Member

Thank you, @yokolet! Closing this issue.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants