-
-
Notifications
You must be signed in to change notification settings - Fork 904
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: revert libxml2 regression with HTML4 recovery
Fixes #2461
- Loading branch information
1 parent
49b8663
commit 5970fd9
Showing
2 changed files
with
65 additions
and
0 deletions.
There are no files selected for viewing
45 changes: 45 additions & 0 deletions
45
patches/libxml2/0010-Revert-Different-approach-to-fix-quadratic-behavior.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
From ddc5f3d22644e0f6fbcc20541c86825757ffee62 Mon Sep 17 00:00:00 2001 | ||
From: Mike Dalessio <mike.dalessio@gmail.com> | ||
Date: Mon, 21 Feb 2022 18:27:45 -0500 | ||
Subject: [PATCH] Revert "Different approach to fix quadratic behavior in HTML | ||
push parser" | ||
|
||
This reverts commit 798bdf13f6964a650b9a0b7b4b3a769f6f1d509a. | ||
--- | ||
HTMLparser.c | 14 +------------- | ||
1 file changed, 1 insertion(+), 13 deletions(-) | ||
|
||
diff --git a/HTMLparser.c b/HTMLparser.c | ||
index eba2d7c..c0b8119 100644 | ||
--- a/HTMLparser.c | ||
+++ b/HTMLparser.c | ||
@@ -3960,25 +3960,13 @@ htmlParseStartTag(htmlParserCtxtPtr ctxt) { | ||
htmlParseErr(ctxt, XML_ERR_NAME_REQUIRED, | ||
"htmlParseStartTag: invalid element name\n", | ||
NULL, NULL); | ||
- /* | ||
- * The recovery code is disabled for now as it can result in | ||
- * quadratic behavior with the push parser. htmlParseStartTag | ||
- * must consume all content up to the final '>' in order to avoid | ||
- * rescanning for this terminator. | ||
- * | ||
- * For a proper fix in line with HTML5, htmlParseStartTag and | ||
- * htmlParseElement should only be called when there's an ASCII | ||
- * alpha character following the initial '<'. Otherwise, the '<' | ||
- * should be emitted as text (unless followed by '!', '/' or '?'). | ||
- */ | ||
-#if 0 | ||
/* if recover preserve text on classic misconstructs */ | ||
if ((ctxt->recovery) && ((IS_BLANK_CH(CUR)) || (CUR == '<') || | ||
(CUR == '=') || (CUR == '>') || (((CUR >= '0') && (CUR <= '9'))))) { | ||
htmlParseCharDataInternal(ctxt, '<'); | ||
return(-1); | ||
} | ||
-#endif | ||
+ | ||
|
||
/* Dump the bogus tag like browsers do */ | ||
while ((CUR != 0) && (CUR != '>') && | ||
-- | ||
2.31.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters