-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change first_of_value
to first_of_hash
, and add more test cases
#33647
Change first_of_value
to first_of_hash
, and add more test cases
#33647
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this addresses the specific issue you reported, this would not fix issues with an additional level of array nesting and so on.
It's again about treating JSON-LD as JSON, while we should ideally treat them semantically but that comes with multiple issues:
- it might require fetching remote documents
- it's more expensive
- we will fail on malformed documents which we can currently process
@ClearlyClaire currently, it fails correctly by recognising a malformed document as it is, since it expects to be returned hashes or nil, while it got an array (which is unexpected) what kind of change should i add here? should i add or is this change "okay", but should it be noted that this area is up for refactor due to it treating that JSON-LD as JSON? i'm just trying to clear my dead queue from bugs, and this seemed like an obvious one; make it fail/fallback properly on malformed data |
Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array. What I meant is that the only actual correct way to handle arbitrary JSON-LD documents is to treat them as JSON-LD, and changes like this PR are not enough to handle all valid cases (and possibly not always correct either? I'm not sure, but I think that behavior of nested arrays may depend on the container type of the attribute) In essence, I'm ok with the change, but it's not the “correct” way and I'm not sure how to draw the line. Ideally, JSON-LD processing would be the way to go, but as I said earlier, it is far more expensive and has its own challenges. |
@ClearlyClaire Ah, then I know a change to add; I just wasn't too sure if this was correct behaviour, so seeing it confirmed will have me implement this little bit of behaviour. |
Yes, I suppose this would be a good change here! Though like I said earlier, I'm not sure whether this would always be safe (but reasonably, we wouldn't expect the attributes names we use to map to something completely different, so that should be fine) |
Fixes #33646, and possibly many more parsing errors.
The rest of
LinkDetailsExtractor
expects these to beHash
es, but that's never enforced anywhere, except for here.This will now catch:
"author": [[]]
This will also discard
"author": ["John Mastodon"]
, but considering this was never expected or rectified anywhere (I checked), similar to"author": "John Mastodon"
, I assume this is alright.This'll also now properly catch an author object in a multi-author array;
"author": ["John", { ... }]
, here it'll catch the second value.