-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the order of items any time an array is used in the parsed output. #29
Comments
There is precedent for using alphabetical sorting. The PHP parser already does this: <div class="h-entry h-cite h-entry"></div> "type": [
"h-cite",
"h-entry",
"h-entry"
] The development version of the Python parser has also been updated to use an alphabetical sorting: <div class="h-entry h-cite h-entry"></div> "type": [
"h-cite",
"h-entry"
] |
As I wrote in http://tantek.com/2018/079/t2 (#30 (comment)), the 'type' array must not convey any ordering semantics from the source (thus it must enforce an artificial order that does not convey anything, in addition to uniqueness per set requirements). All other uses of arrays in the parsed JSON output (children, property values, rel subarrays) already convey appropriate document order semantics. (Originally published at: http://tantek.com/2018/079/t3/) |
+1 |
Spec has been updated so both |
* Parse the rel attribute in accordance with the WHATWG spec: https://infra.spec.whatwg.org/#split-on-ascii-whitespace * Only list unique rel values in the rel-urls output, fixes microformats#159: microformats/microformats2-parsing#30 * Sort the unique rel values alphabetically: microformats/microformats2-parsing#29 * Correctly merge attribute values into the resulting object.
* Parse the rel attribute in accordance with the WHATWG spec: https://infra.spec.whatwg.org/#split-on-ascii-whitespace * Only list unique rel values in the rel-urls output, fixes microformats#159: microformats/microformats2-parsing#30 * Sort the unique rel values alphabetically: microformats/microformats2-parsing#29 * Correctly merge attribute values into the resulting object.
The JSON specification, as defined in both RFC 8259 and RFC 7159 (the basis of I-JSON, see #23), states:
The trick here is the word ordered. The two arrays
["red", "blue"]
and["blue", "red"]
are different in JSON documents because their order is different. From this follows that two microformats parser implementations that generate different arrays from the same input HTML can be said to be incompatible with each other, as they have distinctly different output. (As seen in #22.)The microformats parsing specification should fix this by specifying what order should be used any time an array is used.
Most of the arrays used should follow document order. When filling the
items
orchildren
arrays with microformat structures it is important to keep document order, as consumers may need to find the first occurrence of a specific object. (Example: the authorship discovery algorithm depends on being able to access the firsth-card
matching specific constraints.)But some arrays should not follow document order as they are semantically unordered collections. As an example, the following HTML has 2
div
elements. While the order of their class names is different in the source, this does not matter. Both have an identical set of classes:Because there is no way to have unsorted arrays in JSON, the microformats specification should define an arbitrary sort for these cases.
Using source order here would be a bad idea. This could lead to people interpreting order as being important or something consuming code can rely on when it shouldn’t. (Source order may potentially be a source of bugs here.) Thus data “derived from unordered sets in the source HTML MUST NOT imply any source order”.
The
class
andrel
attributes in HTML are the only ones microformats parsing depends on that are sets in the source HTML where order does not matter. These are mapped to arrays intype
andrels
respectively.The proposed solution is to:
type
andrels
arrays which should be in alphabetical order.The text was updated successfully, but these errors were encountered: