New algorithm for plain text values #168
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Per the change control, I have earlier opened an issue to standardise textContent and proposed a resolution. This is the implementation of that resolution in the parser, so it can be tested and iterated upon before possibly being included in the specification.
This PHP parser was already using a special
innerText
method, but it was not adopted by any other parsers nor did it look like anyone wanted to write it out as part of the microformats parsing specification. This method was based on a text function of microformat-shiv, which in its turn was an emulation of Internet Explorer behaviour.Things of note:
This replaces the old
textContent
andinnerText
methods. There is no replacement forinnerText
, the newtextContent
is the public method for extracting a plain text value from an element.The second new method
elementToString
is set to private as it should not be called outside oftextContent
. It exists on its own only so it can recursively call itself.Whenever
textContent
is called it is no longer wrapped in aunicodeTrim
call. Trimming is handled by the algorithm itself. If it turns out the current trimming in the algorithm isn’t sufficient in practice, we should revise the algorithm.The new
PlainTextTest
currently validates all 9 examples from aaronpk/microformats-whitespace-tests.This broke 3 parser tests, which have been resolved:
ParseImpliedTest::testParsesImpliedNameConsistentWithPName
expected a line break in thename
property. With the new algorithm, line breaks are collapsed into spaces the same way browsers would do.ParserTest::testParseEResolvesRelativeLinks
expected two spaces in the plain textvalue
of thecontent
property. With the new algorithm, consecutive spaces are collapsed to a single one the same way browsers would do.ParserTest::testHtmlEncodesImpliedProperties
was… just wrong? It expected only the string<name>
as the value of thename
property through implied rules. And somehow it had to sidestep the<img>
element completely to do so. I don’t know why the previous parsing even allowed that.