Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New algorithm for plain text values #168

Merged
merged 2 commits into from
Mar 29, 2018

Conversation

Zegnat
Copy link
Member

@Zegnat Zegnat commented Mar 26, 2018

Per the change control, I have earlier opened an issue to standardise textContent and proposed a resolution. This is the implementation of that resolution in the parser, so it can be tested and iterated upon before possibly being included in the specification.

This PHP parser was already using a special innerText method, but it was not adopted by any other parsers nor did it look like anyone wanted to write it out as part of the microformats parsing specification. This method was based on a text function of microformat-shiv, which in its turn was an emulation of Internet Explorer behaviour.

Things of note:

  1. This replaces the old textContent and innerText methods. There is no replacement for innerText, the new textContent is the public method for extracting a plain text value from an element.

    The second new method elementToString is set to private as it should not be called outside of textContent. It exists on its own only so it can recursively call itself.

  2. Whenever textContent is called it is no longer wrapped in a unicodeTrim call. Trimming is handled by the algorithm itself. If it turns out the current trimming in the algorithm isn’t sufficient in practice, we should revise the algorithm.

  3. The new PlainTextTest currently validates all 9 examples from aaronpk/microformats-whitespace-tests.

  4. This broke 3 parser tests, which have been resolved:

    1. ParseImpliedTest::testParsesImpliedNameConsistentWithPName expected a line break in the name property. With the new algorithm, line breaks are collapsed into spaces the same way browsers would do.
    2. ParserTest::testParseEResolvesRelativeLinks expected two spaces in the plain text value of the content property. With the new algorithm, consecutive spaces are collapsed to a single one the same way browsers would do.
    3. ParserTest::testHtmlEncodesImpliedProperties was… just wrong? It expected only the string <name> as the value of the name property through implied rules. And somehow it had to sidestep the <img> element completely to do so. I don’t know why the previous parsing even allowed that.

@Zegnat
Copy link
Member Author

Zegnat commented Mar 26, 2018

Lets first get 0.4.2 out before considering this for merge.

It would be interesting if @aaronpk could test this branch out in his reader prior to merging, and get some data on how well it performs with posts that previously troubled him.

@aaronpk
Copy link
Member

aaronpk commented Mar 26, 2018

If we publish this as 0.4.3-alpha I can relatively easily run it on Aperture for a while to see how it works.

@gRegorLove
Copy link
Member

Awesome work, @Zegnat! 🎉

@aaronpk aaronpk merged commit e8da04f into microformats:master Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants