Skip to content

Commit

Permalink
Use a SoftPool of ElementIterators vs a single ThreadLocal
Browse files Browse the repository at this point in the history
For nested `:has()` evaluators, using a single ThreadLocal of the Element Iterator was effectively non-reentrant, and reusing it may trash the outer iteration.

Fixes #2131
  • Loading branch information
jhy committed Sep 11, 2024
1 parent 7e2d503 commit 0daab3d
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 10 deletions.
4 changes: 3 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,14 @@
* `Element.cssSelector()` would fail if the element's class contained a `*`
character. [2169](https://github.com/jhy/jsoup/issues/2169)
* When tracking source ranges, a text node following an invalid self-closing element may be left
untracked.[2175](https://github.com/jhy/jsoup/issues/2175)
untracked. [2175](https://github.com/jhy/jsoup/issues/2175)
* When a document has no doctype, or a doctype not named `html`, it should be parsed in Quirks
Mode. [2197](https://github.com/jhy/jsoup/issues/2197)
* With a selector like `div:has(span + a)`, the `has()` component was not working correctly, as the inner combining
query caused the evaluator to match those against the outer's siblings, not
children. [2187](https://github.com/jhy/jsoup/issues/2187)
* A selector query that included multiple `:has()` components in a nested `:has()` might incorrectly
execute. [2131](https://github.com/jhy/jsoup/issues/2131)

## 1.18.1 (2024-Jul-10)

Expand Down
22 changes: 14 additions & 8 deletions src/main/java/org/jsoup/select/StructuralEvaluator.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.jsoup.select;

import org.jsoup.internal.Functions;
import org.jsoup.internal.SoftPool;
import org.jsoup.internal.StringUtil;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.NodeIterator;
Expand Down Expand Up @@ -51,8 +52,8 @@ public boolean matches(Element root, Element element) {
}

static class Has extends StructuralEvaluator {
static final ThreadLocal<NodeIterator<Element>> ThreadElementIter =
ThreadLocal.withInitial(() -> new NodeIterator<>(new Element("html"), Element.class));
static final SoftPool<NodeIterator<Element>> ElementIterPool =
new SoftPool<>(() -> new NodeIterator<>(new Element("html"), Element.class));
// the element here is just a placeholder so this can be final - gets set in restart()

private final boolean checkSiblings; // evaluating against siblings (or children)
Expand All @@ -71,13 +72,18 @@ public Has(Evaluator evaluator) {
}
}
// otherwise we only want to match children (or below), and not the input element. And we want to minimize GCs so reusing the Iterator obj
NodeIterator<Element> it = ThreadElementIter.get();
NodeIterator<Element> it = ElementIterPool.borrow();
it.restart(element);
while (it.hasNext()) {
Element el = it.next();
if (el == element) continue; // don't match self, only descendants
if (evaluator.matches(element, el))
return true;
try {
while (it.hasNext()) {
Element el = it.next();
if (el == element) continue; // don't match self, only descendants
if (evaluator.matches(element, el)) {
return true;
}
}
} finally {
ElementIterPool.release(it);
}
return false;
}
Expand Down
21 changes: 20 additions & 1 deletion src/test/java/org/jsoup/select/SelectorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1309,7 +1309,7 @@ public void emptyPseudo() {
}

@Test void divHasDivPreceding() {
// https://github.com/jhy/jsoup/issues/2131 , dupe of https://github.com/jhy/jsoup/issues/2187
// https://github.com/jhy/jsoup/issues/2131
String html = "<div id=1>\n" +
"<div 1><span>hello</span></div>\n" +
"<div 2><span>there</span></div>\n" +
Expand All @@ -1324,4 +1324,23 @@ public void emptyPseudo() {
assertEquals("div", els.first().normalName());
assertEquals("1", els.first().id());
}

@Test void nestedMultiHas() {
// https://github.com/jhy/jsoup/issues/2131
String html =
"<html>" +
"<head></head>" +
"<body>" +
"<div id=o>" +
"<div id=i1><span id=s1>hello</span></div>" +
"<div id=i2><span id=s2>world</span></div>" +
"</div>" +
"</body></html>";
Document document = Jsoup.parse(html);

String q = "div:has(> div:has(> span) + div:has(> span))";
Elements els = document.select(q);
assertEquals(1, els.size());
assertEquals("o", els.get(0).id());
}
}

0 comments on commit 0daab3d

Please sign in to comment.