reduce instances when p-name is implied #6

tantek · 2017-03-15T18:20:10Z

This is a split-off of part of http://microformats.org/wiki/microformats2-parsing-issues#implied_properties_when_an_explicit_class_is_provided that was left unresolved without consensus (since that issue was resolved with just dealing with u-url and nothing more).

Experience has shown there are a number of instances where implied p-name produces something unhelpful. Typically this happens with otherwise large microformats which for some reason omitted the name (e.g. an h-feed, or h-entry that has no author supplied name etc.).

The use-case for which implied p-name was designed was for small microformats, e.g. just a hyperlink with h-* class on it, or maybe just a simple set of nested elements (without siblings). It has some similar use-cases too. But by the point you have elements with multiple explicit properties specified inside, rarely does the current implied p-name rule produce anything useful.

One thing that would help is links to specific examples where excessive or otherwise "useless" p-names are being implied, where no p-name would actually be preferable (from a consuming code point of view).

tantek · 2017-03-15T18:20:45Z

From the end of the wiki discussion, one straw proposal was:

"any explicit p-* property on an element stops implied p-name"

(this sounds a bit ambiguous and could be reworded, but I think the general intent / principle is workable)

aaronpk · 2017-03-15T18:22:58Z

I'm not clear on what that proposal actually means. Does "stop" mean no implied name is generated at all? Does it mean that property with p-* is excluded from the implied name?

tantek · 2017-03-15T18:24:24Z

Either possibility could be pursued. Let's start documenting examples of existing failures / misbehaviors so we can look at specifics.

aaronpk · 2017-03-15T18:28:57Z

Here is an example that came up today: https://huffduffer.com/tags/indieweb

This boils down to a structure like below:

<ol class="h-feed">
  <li class="h-entry">
    <h3 class="p-name">Episode Name</h3>
    <div class="e-content">episode description...</div>
    <audio class="u-audio" src="..."></audio>
    <p class="p-author h-card">...</p>
  </li>
</ol>

The result of the current implied p-name rule is all the contents of the individual h-entrys end up being in the implied p-name of the h-feed.

mblaney · 2017-03-16T03:43:15Z

Would it make sense for implied p-name to skip content from h-* children? It would mean in the above case that the implied p-name for the above h-feed is empty.

tantek · 2018-01-05T18:44:03Z

Restating for clarity, and adding children as another way to stop implying p-name

"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"

Need consensus positive feedback from parser developers and one proof of implementation to proceed (and no objections obviously). Yes you can thumbs-up to indicate positive feedback 😃

gRegorLove · 2018-01-05T18:44:22Z

Here's a Bridgy example: https://brid-gy.appspot.com/comment/twitter/miklb/948601132397588481/949306079623766016

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "author": [
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:deaaqua"
                            ],
                            "numeric-id": [
                                "26367081"
                            ],
                            "name": [
                                "Amanda Rocksalot"
                            ],
                            "nickname": [
                                "deaaqua"
                            ],
                            "url": [
                                "https://www.instagram.com/deaaqua/",
                                "https://twitter.com/deaaqua"
                            ],
                            "photo": [
                                "https://pbs.twimg.com/profile_images/1811120958/Buttercups_and_Books_by_pinkparis1233.jpg"
                            ]
                        },
                        "value": "Amanda Rocksalot"
                    }
                ],
                "category": [
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:GatoIshwary"
                            ],
                            "name": [
                                "Gato Ishwary"
                            ],
                            "url": [
                                "https://twitter.com/GatoIshwary"
                            ]
                        },
                        "value": "https://twitter.com/GatoIshwary"
                    },
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:miklb"
                            ],
                            "name": [
                                "Michael Bishop"
                            ],
                            "url": [
                                "https://twitter.com/miklb",
                                "https://miklb.com/"
                            ]
                        },
                        "value": "https://twitter.com/miklb"
                    }
                ],
                "uid": [
                    "tag:twitter.com,2013:949306079623766016"
                ],
                "url": [
                    "https://twitter.com/deaaqua/status/949306079623766016"
                ],
                "video": [
                    "https://video.twimg.com/tweet_video/DSyc1j3VQAswKgR.mp4"
                ],
                "in-reply-to": [
                    "https://twitter.com/GatoIshwary/status/949250988946395137",
                    "https://miklb.com/2018/01/3012/"
                ],
                "published": [
                    "2018-01-05T15:46:21+00:00"
                ],
                "name": [
                    "tag:twitter.com,2013:949306079623766016\n  \n  2018-01-05T15:46:21+00:00\n      Amanda Rocksalot\n\n    deaaqua\n    \n\n  https://twitter.com/deaaqua/status/949306079623766016\n  \n  \n  \n   \nYour browser does not support the video tag. Click here to view directly. \n     Gato Ishwary\n    \n    \n  \n\n  \n     Michael Bishop"
                ]
            }
        }
    ],
    "rels": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "v0.3.2",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}

aaronpk · 2018-01-12T16:32:01Z

I encountered a similar case today working on aaronpk/XRay#52 where a photo property seemed to come out of nowhere.

In this example, I would not expect the photo property to be implied:

<div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div>

From the authoring perspective, I would expect the act of defining the e-content class should prevent the img tag inside from turning into an implied property on the parent h-entry.

Zegnat · 2018-01-12T17:37:16Z

The implied photo in @aaronpk’s example is done through the rule:

else if .h-x>:only-child:not[.h-*]>img[src]:only-of-type:not[.h-*], then use that img’s src for photo

This rule looks pretty complex. It seems to exist mostly in case of some wrapper element being inserted between the root h-x and the data we want to use for extracting implied values. That’s what I assume to be the original use case because the in-between element has to be the only element within the root h-x.

Either the existence of explicit properties should stop the implied parsing, or the rules should incorporate some form of limitation in case of the wrapper element. Adding :not[.p-*]:not[.u-*]:not[.dt-*]:not[.e-*] to the :only-child in-between element selector would be an option here, but I personally don’t like making these implied values rules even bigger and more convoluted. They are already a source of many questions.

Zegnat · 2018-01-14T23:08:37Z

Just a thought I just had in chat when I realised e-content wasn’t going to stop a name property to be implied per @tantek’s proposed new limitation:

"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"

Is there any reason why we limit this to p- properties only and not all explicit properties? I still would not be able to remove the redundant p-name from my blog posts. Here is my HTML:

    <article class="h-entry">
      <main class="e-content p-name">
        <p>Post</p>
      </main>
      <footer>
        <p>Published on <time class="dt-published" datetime="2017-11-07T13:27:49+01:00">7<sup>th</sup> November 2017 13:27:49</time>.</p>
                <p class="editorlink">[<a rel="edit" href="#">edit</a>] [<a href="#" class="u-url u-uid">permalink</a>]</p>
      </footer>
    </article>

This h-entry includes an e- property, a dt- property, and two u- properties. So without p-name there are no other p- properties to stop the name from being implied.

Zegnat · 2018-01-14T23:44:59Z

Tantek’s home page is another example of h-entry markup that will need to keep explicitly setting p-name because it would get implied wrong otherwise:

<li class="h-entry hentry as-note">
<p class="p-name entry-title e-content entry-content article">Post</p>
<span class="info footer"><a href="#" class="dt-published published dt-updated updated u-url u-uid"><time class="value" datetime="05:11-0800">05:11</time> on <time class="value">2018-01-14</time></a></span>
</li>

It (like mine) has several other properties defined for a post but no additional p- properties.

Zegnat · 2018-01-28T11:46:28Z

Today I spotted another live example of a post where the parsed name will not be fixed by the proposed change. Hat tip to @gRegorLove, who linked to Upon 2020 first. See chat. The post he linked to includes a p-in-reply-to and will indeed be fixed by the proposed change. But…

@jernst’s website uses the following HTML for note posts (edited to remove non-mf2 classes and emty unrelated attributes for easier reading):

<li id="" class="h-entry">
  <a href="" title="Posts by jernst ( @jernst )" class="">
    <img alt='' src='' srcset='' class='' height='48' width='48' />
  </a>
  <h4>
    <a href="" title="Posts by jernst ( @jernst )">jernst</a>
    <span class="">
      <abbr class="dt-published" title="2018-01-24T16:56:18Z">
        08:56 <em>on</em> January 24, 2018
      </abbr>
      <span class="">
        <a href="" class="u-url" title="Permalink">Permalink</a>
        <a rel="nofollow" class="" href="">Log in to leave a Comment</a>
      </span>
      <span class="">&nbsp;</span>
    </span>
  </h4>
  <div id="" class="e-content">
    <p>
      My &#8220;invited guest post&#8221; (is that a thing?) on opportunities
      of the #IndieWeb for small businesses went live on GoDaddy&#8217;s site:
      <a href="" rel="nofollow">https://www.godaddy.com/garage/indieweb-facebook-opportunities/</a>
    </p>
  </div>
  ⋮
</li>

This has a u-url, dt-published, and e-content. Even under the current proposal, as there is no other p- property, the name will be implied and include meta data. (As shown by Loqi).

Also note that if the reply post linked by @gRegorLove had used u-in-reply-to it wouldn’t be fixed either. This is a great example of why I do not think we should limit it to p- properties. That post would not be expressing any difference in intent between u-in-reply-to and p-in-reply-to but the parsers will assign it an implied name in one case and not for the other. I think this will lead to extra confusion.

kevinmarks · 2018-02-16T14:24:54Z

Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present.
Not sure if this a key issue.

Zegnat · 2018-02-16T17:14:40Z

Documenting from yesterday’s chat, because nobody could remember this and I can’t find it elsewhere:

@tantek would like to keep parser changes conservative but is “amenable to expanding to any p- or e- = no implied p-name”.
I have not been able to document any cases where more than this is necessary either.

So I think the (as of now) latest proposed spec change would be:

p-name MUST NOT be implied if there are any explicit p-* or e-* properties, or any nested microformats.

Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present.

Here too I will just document an answer given in chat, this time by @aaronpk:

may just mean it has to be released as a major version number [of the parsers]

Assuming something like semver is being used, any major version bump should signify possible API changes to the user. I too don’t think that would be an issue.

There might be an issue if someone is using parsers-as-a-service, e.g. always getting their mf2 parser output from php.microformats.io. But I don’t think anyone ever advertised their online parsers as a service?

aaronpk · 2018-02-16T17:17:57Z

I (and likely others) use xray.p3k.io as a service, so I will have to consider what to do in that case. It doesn't return the Microformats JSON, it converts it to its jf2 format first. I may just return an empty string for name if there is no mf2 name property.

Zegnat · 2018-02-16T17:46:14Z

It doesn’t return the Microformats JSON, it converts it to its jf2 format first.

It might be worth opening an issue on jf2 to see if they want to keep an explicit name property. Their “author” syntax says it “consists of a name” and more, but that’s not marked as a MUST. For a jf2feed on the other hand name is a SHOULD.

The real question is, do you see any reasons for postponing this change because of your use of a mf2 parser as a service? I think not?

aaronpk · 2018-02-16T17:49:56Z

Nope, wasn't intending to hold things up, just wanted to put that there for the record.

I agree with the current proposal of having p-* and e-* and h-* stop the implied name.

kartikprabhu · 2018-02-17T18:54:43Z

Is there a consensus on this issue? If yes, I can look into adding this to mf2py.

Zegnat · 2018-02-17T21:38:50Z

At @kartikprabhu’s request here are some super simplified examples of HTML where unwanted metadata (a footer of an article) would be swept up in the name even though I only want content or like-of to show. Before and afters have been included. Output taken from the PHP parser.

Case with `e-content` and no `p-name`

HTML

<article class="h-entry">
  <div class="e-content">
    <p>Wanted content.</p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    {
                        "html": "\r\n    <p>Wanted content.</p>\r\n  ",
                        "value": "Wanted content."
                    }
                ],
                "name": [
                    "Wanted content. \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    {
                        "html": "\r\n    <p>Wanted content.</p>\r\n  ",
                        "value": "Wanted content."
                    }
                ]
            }
        }
    ],
    "rels": {}
}

Case with `p-content` and no `p-name`

HTML

<article class="h-entry">
  <div class="p-content">
    <p>Wanted content.</p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    "Wanted content."
                ],
                "name": [
                    "Wanted content. \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    "Wanted content."
                ]
            }
        }
    ],
    "rels": {}
}

Case with nested microformat and no `p-name`

HTML

<article class="h-entry">
  <div class="u-like-of h-cite">
    <p>I really like <a class="p-name u-url" href="http://microformats.org/">Microformats</a></p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "like-of": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                "Microformats"
                            ],
                            "url": [
                                "http://microformats.org/"
                            ]
                        },
                        "value": "http://microformats.org/"
                    }
                ],
                "name": [
                    "I really like Microformats \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "like-of": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                "Microformats"
                            ],
                            "url": [
                                "http://microformats.org/"
                            ]
                        },
                        "value": "http://microformats.org/"
                    }
                ]
            }
        }
    ],
    "rels": {}
}

kartikprabhu · 2018-02-17T22:30:25Z

Implemented in mf2py here https://github.com/kartikprabhu/mf2py/tree/implied-name-fix Will push to the main version if this gets consensus and makes it to the spec.

Zegnat · 2018-02-18T12:18:07Z

With an implementation for mf2py by @kartikprabhu and an intent to implement in mf2php by @gRegorLove, is that enough to update the spec? @tantek?

kartikprabhu · 2018-02-18T17:58:59Z

it seems the output from mf2py depends on which internal HTML parser is being used ( see: kartikprabhu/mf2py#58 (comment) ), but this should count as an internal bug for mf2py to be fixed.

So I am +1 for this being included in the spec.

kartikprabhu · 2018-02-19T16:45:19Z

implied-name-stopping has been implemented in an experimental version of mf2py.

so +1

Zegnat · 2018-03-04T20:32:18Z

This has been added to the spec, at revision 20:27, 4 March 2018. Resolved.

aaronpk mentioned this issue Jan 12, 2018

incorrectly including a photo property aaronpk/XRay#55

Closed

snarfed mentioned this issue Feb 15, 2018

Instagram->HTML: photos with no caption are missing a blank name property snarfed/granary#131

Closed

kartikprabhu mentioned this issue Feb 17, 2018

reduce instances when p-name is implied kartikprabhu/mf2py#58

Closed

aaronpk mentioned this issue Feb 25, 2018

reduce instances where photo is implied #21

Closed

gRegorLove added a commit to gRegorLove/php-mf2 that referenced this issue Mar 4, 2018

Add failing tests and fix for microformats/microformats2-parsing#6

c390d69

gRegorLove mentioned this issue Mar 4, 2018

Reduce instances when p-name is implied microformats/php-mf2#142

Merged

kartikprabhu mentioned this issue Mar 5, 2018

new implied name parsing rules and better backcompat microformats/mf2py#94

Merged

gRegorLove mentioned this issue Mar 11, 2018

backcompat child of mf2 root #11

Closed

tantek closed this as completed May 29, 2018

This was referenced Jul 2, 2018

Updates for new implied p-name rules microformats/microformats-ruby#86

Closed

Microformats markup on spoiler posts is wrong mastodon/mastodon#7926

Closed

EdwardHinkle mentioned this issue Jul 9, 2018

Implied p-name issues voxpelli/webpage-webmentions#75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce instances when p-name is implied #6

reduce instances when p-name is implied #6

tantek commented Mar 15, 2017 •

edited

Loading

tantek commented Mar 15, 2017

aaronpk commented Mar 15, 2017

tantek commented Mar 15, 2017

aaronpk commented Mar 15, 2017

mblaney commented Mar 16, 2017

tantek commented Jan 5, 2018 •

edited

Loading

gRegorLove commented Jan 5, 2018

aaronpk commented Jan 12, 2018

Zegnat commented Jan 12, 2018

Zegnat commented Jan 14, 2018

Zegnat commented Jan 14, 2018

Zegnat commented Jan 28, 2018

kevinmarks commented Feb 16, 2018

Zegnat commented Feb 16, 2018

aaronpk commented Feb 16, 2018

Zegnat commented Feb 16, 2018

aaronpk commented Feb 16, 2018

kartikprabhu commented Feb 17, 2018

Zegnat commented Feb 17, 2018 •

edited

Loading

kartikprabhu commented Feb 17, 2018

Zegnat commented Feb 18, 2018

kartikprabhu commented Feb 18, 2018

kartikprabhu commented Feb 19, 2018

Zegnat commented Mar 4, 2018

reduce instances when p-name is implied #6

reduce instances when p-name is implied #6

Comments

tantek commented Mar 15, 2017 • edited Loading

tantek commented Mar 15, 2017

aaronpk commented Mar 15, 2017

tantek commented Mar 15, 2017

aaronpk commented Mar 15, 2017

mblaney commented Mar 16, 2017

tantek commented Jan 5, 2018 • edited Loading

gRegorLove commented Jan 5, 2018

aaronpk commented Jan 12, 2018

Zegnat commented Jan 12, 2018

Zegnat commented Jan 14, 2018

Zegnat commented Jan 14, 2018

Zegnat commented Jan 28, 2018

kevinmarks commented Feb 16, 2018

Zegnat commented Feb 16, 2018

aaronpk commented Feb 16, 2018

Zegnat commented Feb 16, 2018

aaronpk commented Feb 16, 2018

kartikprabhu commented Feb 17, 2018

Zegnat commented Feb 17, 2018 • edited Loading

Case with e-content and no p-name

HTML

Current output

Expected output

Case with p-content and no p-name

HTML

Current output

Expected output

Case with nested microformat and no p-name

HTML

Current output

Expected output

kartikprabhu commented Feb 17, 2018

Zegnat commented Feb 18, 2018

kartikprabhu commented Feb 18, 2018

kartikprabhu commented Feb 19, 2018

Zegnat commented Mar 4, 2018

tantek commented Mar 15, 2017 •

edited

Loading

tantek commented Jan 5, 2018 •

edited

Loading

Zegnat commented Feb 17, 2018 •

edited

Loading

Case with `e-content` and no `p-name`

Case with `p-content` and no `p-name`

Case with nested microformat and no `p-name`