-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce instances when p-name is implied #6
Comments
From the end of the wiki discussion, one straw proposal was: "any explicit p-* property on an element stops implied p-name" (this sounds a bit ambiguous and could be reworded, but I think the general intent / principle is workable) |
I'm not clear on what that proposal actually means. Does "stop" mean no implied name is generated at all? Does it mean that property with p-* is excluded from the implied name? |
Either possibility could be pursued. Let's start documenting examples of existing failures / misbehaviors so we can look at specifics. |
Here is an example that came up today: https://huffduffer.com/tags/indieweb This boils down to a structure like below: <ol class="h-feed">
<li class="h-entry">
<h3 class="p-name">Episode Name</h3>
<div class="e-content">episode description...</div>
<audio class="u-audio" src="..."></audio>
<p class="p-author h-card">...</p>
</li>
</ol> The result of the current implied p-name rule is all the contents of the individual h-entrys end up being in the implied p-name of the h-feed. |
Would it make sense for implied p-name to skip content from h-* children? It would mean in the above case that the implied p-name for the above h-feed is empty. |
Restating for clarity, and adding children as another way to stop implying p-name "p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats" Need consensus positive feedback from parser developers and one proof of implementation to proceed (and no objections obviously). Yes you can thumbs-up to indicate positive feedback 😃 |
Here's a Bridgy example: https://brid-gy.appspot.com/comment/twitter/miklb/948601132397588481/949306079623766016
|
I encountered a similar case today working on aaronpk/XRay#52 where a In this example, I would not expect the <div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div> From the authoring perspective, I would expect the act of defining the |
The implied
This rule looks pretty complex. It seems to exist mostly in case of some wrapper element being inserted between the root Either the existence of explicit properties should stop the implied parsing, or the rules should incorporate some form of limitation in case of the wrapper element. Adding |
Just a thought I just had in chat when I realised
Is there any reason why we limit this to <article class="h-entry">
<main class="e-content p-name">
<p>Post</p>
</main>
<footer>
<p>Published on <time class="dt-published" datetime="2017-11-07T13:27:49+01:00">7<sup>th</sup> November 2017 13:27:49</time>.</p>
<p class="editorlink">[<a rel="edit" href="#">edit</a>] [<a href="#" class="u-url u-uid">permalink</a>]</p>
</footer>
</article> This |
Tantek’s home page is another example of <li class="h-entry hentry as-note">
<p class="p-name entry-title e-content entry-content article">Post</p>
<span class="info footer"><a href="#" class="dt-published published dt-updated updated u-url u-uid"><time class="value" datetime="05:11-0800">05:11</time> on <time class="value">2018-01-14</time></a></span>
</li> It (like mine) has several other properties defined for a post but no additional |
Today I spotted another live example of a post where the parsed @jernst’s website uses the following HTML for note posts (edited to remove non-mf2 classes and emty unrelated attributes for easier reading): <li id="" class="h-entry">
<a href="" title="Posts by jernst ( @jernst )" class="">
<img alt='' src='' srcset='' class='' height='48' width='48' />
</a>
<h4>
<a href="" title="Posts by jernst ( @jernst )">jernst</a>
<span class="">
<abbr class="dt-published" title="2018-01-24T16:56:18Z">
08:56 <em>on</em> January 24, 2018
</abbr>
<span class="">
<a href="" class="u-url" title="Permalink">Permalink</a>
<a rel="nofollow" class="" href="">Log in to leave a Comment</a>
</span>
<span class=""> </span>
</span>
</h4>
<div id="" class="e-content">
<p>
My “invited guest post” (is that a thing?) on opportunities
of the #IndieWeb for small businesses went live on GoDaddy’s site:
<a href="" rel="nofollow">https://www.godaddy.com/garage/indieweb-facebook-opportunities/</a>
</p>
</div>
⋮
</li> This has a Also note that if the reply post linked by @gRegorLove had used |
Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present. |
Documenting from yesterday’s chat, because nobody could remember this and I can’t find it elsewhere:
So I think the (as of now) latest proposed spec change would be:
Here too I will just document an answer given in chat, this time by @aaronpk:
Assuming something like semver is being used, any major version bump should signify possible API changes to the user. I too don’t think that would be an issue. There might be an issue if someone is using parsers-as-a-service, e.g. always getting their mf2 parser output from |
I (and likely others) use xray.p3k.io as a service, so I will have to consider what to do in that case. It doesn't return the Microformats JSON, it converts it to its jf2 format first. I may just return an empty string for |
It might be worth opening an issue on jf2 to see if they want to keep an explicit The real question is, do you see any reasons for postponing this change because of your use of a mf2 parser as a service? I think not? |
Nope, wasn't intending to hold things up, just wanted to put that there for the record. I agree with the current proposal of having |
Is there a consensus on this issue? If yes, I can look into adding this to mf2py. |
At @kartikprabhu’s request here are some super simplified examples of HTML where unwanted metadata (a Case with
|
Implemented in mf2py here https://github.com/kartikprabhu/mf2py/tree/implied-name-fix Will push to the main version if this gets consensus and makes it to the spec. |
With an implementation for mf2py by @kartikprabhu and an intent to implement in mf2php by @gRegorLove, is that enough to update the spec? @tantek? |
it seems the output from mf2py depends on which internal HTML parser is being used ( see: kartikprabhu/mf2py#58 (comment) ), but this should count as an internal bug for mf2py to be fixed. So I am +1 for this being included in the spec. |
implied-name-stopping has been implemented in an experimental version of mf2py. so +1 |
This has been added to the spec, at revision 20:27, 4 March 2018. Resolved. |
This is a split-off of part of http://microformats.org/wiki/microformats2-parsing-issues#implied_properties_when_an_explicit_class_is_provided that was left unresolved without consensus (since that issue was resolved with just dealing with u-url and nothing more).
Experience has shown there are a number of instances where implied p-name produces something unhelpful. Typically this happens with otherwise large microformats which for some reason omitted the name (e.g. an h-feed, or h-entry that has no author supplied name etc.).
The use-case for which implied p-name was designed was for small microformats, e.g. just a hyperlink with h-* class on it, or maybe just a simple set of nested elements (without siblings). It has some similar use-cases too. But by the point you have elements with multiple explicit properties specified inside, rarely does the current implied p-name rule produce anything useful.
One thing that would help is links to specific examples where excessive or otherwise "useless" p-names are being implied, where no p-name would actually be preferable (from a consuming code point of view).
The text was updated successfully, but these errors were encountered: