Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes in form-elements are not deduplicated when parsing as HTML #1949

Closed
perlan opened this issue May 5, 2023 · 1 comment · Fixed by #1950
Closed

Attributes in form-elements are not deduplicated when parsing as HTML #1949

perlan opened this issue May 5, 2023 · 1 comment · Fixed by #1950
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@perlan
Copy link
Contributor

perlan commented May 5, 2023

I have an issue with form-elements that contains duplicate attributes. I was expecting that attributes in form-elements should be deduplicated in the same way as all other elements when parsing HTML via Parser.htmlParser() but duplicate attributes seems to always be retained. Using Parser.xmlParser(), form attributes are correctly deduplicated so the issue only affects HTML.

Looking though old issues and pull requests, I found #1219 that seems to fix deduplication for start tags but that fix doesn't seems to apply to form-elements.

Here is a simple test-case adopted from HtmlParserTest:

    @Test public void dropsDuplicateAttributesInFormElement() {
        String html = "<form One=One ONE=Two Two=two one=Three One=Four two=Five></form>";
        Parser parser = Parser.htmlParser().setTrackErrors(10);
        Document doc = parser.parseInput(html, "");

        Element p = doc.selectFirst("form");
        assertEquals("<form one=\"One\" two=\"two\"></form>", p.outerHtml()); // normalized names due to lower casing

        assertEquals(1, parser.getErrors().size());
        assertEquals("Dropped duplicate attribute(s) in tag [form]", parser.getErrors().get(0).getErrorMessage());
    }
perlan added a commit to perlan/jsoup that referenced this issue May 5, 2023
@jhy jhy added the bug Confirmed bug that we should fix label May 6, 2023
@jhy
Copy link
Owner

jhy commented May 6, 2023

Good find, thanks for the testcase repro and the PR. Added comments in the PR.

@jhy jhy closed this as completed in #1950 May 8, 2023
jhy added a commit that referenced this issue May 8, 2023
Add test-case and fixes for attribute deduplication in form and empty elements

Fixes #1949
---------

Co-authored-by: Jonathan Hedley <jonathan@hedley.net>
jhy added a commit that referenced this issue May 8, 2023
@jhy jhy added the fixed label May 8, 2023
@jhy jhy added this to the 1.16.2 milestone May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants