Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change first_of_value to first_of_hash, and add more test cases #33647

Merged

Conversation

ShadowJonathan
Copy link
Contributor

@ShadowJonathan ShadowJonathan commented Jan 19, 2025

Fixes #33646, and possibly many more parsing errors.

The rest of LinkDetailsExtractor expects these to be Hashes, but that's never enforced anywhere, except for here.

This will now catch:

  • "author": [[]]

This will also discard "author": ["John Mastodon"], but considering this was never expected or rectified anywhere (I checked), similar to "author": "John Mastodon", I assume this is alright.

This'll also now properly catch an author object in a multi-author array; "author": ["John", { ... }], here it'll catch the second value.

@mjankowski mjankowski added the ruby Pull requests that update Ruby code label Jan 20, 2025
Copy link
Contributor

@ClearlyClaire ClearlyClaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this addresses the specific issue you reported, this would not fix issues with an additional level of array nesting and so on.

It's again about treating JSON-LD as JSON, while we should ideally treat them semantically but that comes with multiple issues:

  • it might require fetching remote documents
  • it's more expensive
  • we will fail on malformed documents which we can currently process

@ShadowJonathan
Copy link
Contributor Author

@ClearlyClaire currently, it fails correctly by recognising a malformed document as it is, since it expects to be returned hashes or nil, while it got an array (which is unexpected)

what kind of change should i add here? should i add is_a?(Hash) or return nil, or how should mastodon behave here?

or is this change "okay", but should it be noted that this area is up for refactor due to it treating that JSON-LD as JSON?

i'm just trying to clear my dead queue from bugs, and this seemed like an obvious one; make it fail/fallback properly on malformed data

@ClearlyClaire
Copy link
Contributor

@ClearlyClaire currently, it fails correctly by recognising a malformed document as it is, since it expects to be returned hashes or nil, while it got an array (which is unexpected)

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

What I meant is that the only actual correct way to handle arbitrary JSON-LD documents is to treat them as JSON-LD, and changes like this PR are not enough to handle all valid cases (and possibly not always correct either? I'm not sure, but I think that behavior of nested arrays may depend on the container type of the attribute)

In essence, I'm ok with the change, but it's not the “correct” way and I'm not sure how to draw the line. Ideally, JSON-LD processing would be the way to go, but as I said earlier, it is far more expensive and has its own challenges.

@ShadowJonathan
Copy link
Contributor Author

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

@ClearlyClaire Ah, then I know a change to add; .flattening the array, and picking the first element.

I just wasn't too sure if this was correct behaviour, so seeing it confirmed will have me implement this little bit of behaviour.

@ClearlyClaire
Copy link
Contributor

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

@ClearlyClaire Ah, then I know a change to add; .flattening the array, and picking the first element.

I just wasn't too sure if this was correct behaviour, so seeing it confirmed will have me implement this little bit of behaviour.

Yes, I suppose this would be a good change here! Though like I said earlier, I'm not sure whether this would always be safe (but reasonably, we wouldn't expect the attributes names we use to map to something completely different, so that should be fine)

@ClearlyClaire ClearlyClaire added this pull request to the merge queue Jan 22, 2025
Merged via the queue into mastodon:main with commit b18caff Jan 22, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ruby Pull requests that update Ruby code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LinkCrawlWorker TypeError: no implicit conversion of String into Integer
3 participants