Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the order of items any time an array is used in the parsed output. #29

Closed
Zegnat opened this issue Mar 20, 2018 · 4 comments
Closed

Comments

@Zegnat
Copy link
Member

Zegnat commented Mar 20, 2018

The JSON specification, as defined in both RFC 8259 and RFC 7159 (the basis of I-JSON, see #23), states:

An array is an ordered sequence of zero or more values.

The trick here is the word ordered. The two arrays ["red", "blue"] and ["blue", "red"] are different in JSON documents because their order is different. From this follows that two microformats parser implementations that generate different arrays from the same input HTML can be said to be incompatible with each other, as they have distinctly different output. (As seen in #22.)

The microformats parsing specification should fix this by specifying what order should be used any time an array is used.

Most of the arrays used should follow document order. When filling the items or children arrays with microformat structures it is important to keep document order, as consumers may need to find the first occurrence of a specific object. (Example: the authorship discovery algorithm depends on being able to access the first h-card matching specific constraints.)

But some arrays should not follow document order as they are semantically unordered collections. As an example, the following HTML has 2 div elements. While the order of their class names is different in the source, this does not matter. Both have an identical set of classes:

<div class="alpha beta"></div>
<div class="beta alpha"></div>

Because there is no way to have unsorted arrays in JSON, the microformats specification should define an arbitrary sort for these cases.

Using source order here would be a bad idea. This could lead to people interpreting order as being important or something consuming code can rely on when it shouldn’t. (Source order may potentially be a source of bugs here.) Thus data “derived from unordered sets in the source HTML MUST NOT imply any source order”.

The class and rel attributes in HTML are the only ones microformats parsing depends on that are sets in the source HTML where order does not matter. These are mapped to arrays in type and rels respectively.

The proposed solution is to:

  1. define that whenever items are added to an array during microformats parsing, this matches the source order,
  2. except for the type and rels arrays which should be in alphabetical order.
@Zegnat
Copy link
Member Author

Zegnat commented Mar 20, 2018

There is precedent for using alphabetical sorting. The PHP parser already does this:

<div class="h-entry h-cite h-entry"></div>
"type": [
  "h-cite",
  "h-entry",
  "h-entry"
]

The development version of the Python parser has also been updated to use an alphabetical sorting:

<div class="h-entry h-cite h-entry"></div>
"type": [
  "h-cite",
  "h-entry"
]

@tantek
Copy link
Member

tantek commented Mar 20, 2018

As I wrote in http://tantek.com/2018/079/t2 (#30 (comment)), the 'type' array must not convey any ordering semantics from the source (thus it must enforce an artificial order that does not convey anything, in addition to uniqueness per set requirements). All other uses of arrays in the parsed JSON output (children, property values, rel subarrays) already convey appropriate document order semantics.

(Originally published at: http://tantek.com/2018/079/t3/)

@kartikprabhu
Copy link
Member

+1
cc: @tantek

@Zegnat
Copy link
Member Author

Zegnat commented Mar 21, 2018

Spec has been updated so both type and rels arrays must be sorted alphabetically.

@Zegnat Zegnat closed this as completed Mar 21, 2018
Zegnat added a commit to Zegnat/php-mf2 that referenced this issue Mar 22, 2018
* Parse the rel attribute in accordance with the WHATWG spec:
  https://infra.spec.whatwg.org/#split-on-ascii-whitespace
* Only list unique rel values in the rel-urls output, fixes microformats#159:
  microformats/microformats2-parsing#30
* Sort the unique rel values alphabetically:
  microformats/microformats2-parsing#29
* Correctly merge attribute values into the resulting object.
Zegnat added a commit to Zegnat/php-mf2 that referenced this issue Mar 24, 2018
* Parse the rel attribute in accordance with the WHATWG spec:
  https://infra.spec.whatwg.org/#split-on-ascii-whitespace
* Only list unique rel values in the rel-urls output, fixes microformats#159:
  microformats/microformats2-parsing#30
* Sort the unique rel values alphabetically:
  microformats/microformats2-parsing#29
* Correctly merge attribute values into the resulting object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants