Change `first_of_value` to `first_of_hash`, and add more test cases #33647

ShadowJonathan · 2025-01-19T12:07:14Z

Fixes #33646, and possibly many more parsing errors.

The rest of LinkDetailsExtractor expects these to be Hashes, but that's never enforced anywhere, except for here.

This will now catch:

"author": [[]]

This will also discard "author": ["John Mastodon"], but considering this was never expected or rectified anywhere (I checked), similar to "author": "John Mastodon", I assume this is alright.

This'll also now properly catch an author object in a multi-author array; "author": ["John", { ... }], here it'll catch the second value.

ClearlyClaire

While this addresses the specific issue you reported, this would not fix issues with an additional level of array nesting and so on.

It's again about treating JSON-LD as JSON, while we should ideally treat them semantically but that comes with multiple issues:

it might require fetching remote documents
it's more expensive
we will fail on malformed documents which we can currently process

ShadowJonathan · 2025-01-21T20:10:09Z

@ClearlyClaire currently, it fails correctly by recognising a malformed document as it is, since it expects to be returned hashes or nil, while it got an array (which is unexpected)

what kind of change should i add here? should i add is_a?(Hash) or return nil, or how should mastodon behave here?

or is this change "okay", but should it be noted that this area is up for refactor due to it treating that JSON-LD as JSON?

i'm just trying to clear my dead queue from bugs, and this seemed like an obvious one; make it fail/fallback properly on malformed data

ClearlyClaire · 2025-01-22T09:09:55Z

@ClearlyClaire currently, it fails correctly by recognising a malformed document as it is, since it expects to be returned hashes or nil, while it got an array (which is unexpected)

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

What I meant is that the only actual correct way to handle arbitrary JSON-LD documents is to treat them as JSON-LD, and changes like this PR are not enough to handle all valid cases (and possibly not always correct either? I'm not sure, but I think that behavior of nested arrays may depend on the container type of the attribute)

In essence, I'm ok with the change, but it's not the “correct” way and I'm not sure how to draw the line. Ideally, JSON-LD processing would be the way to go, but as I said earlier, it is far more expensive and has its own challenges.

ShadowJonathan · 2025-01-22T12:50:57Z

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

@ClearlyClaire Ah, then I know a change to add; .flattening the array, and picking the first element.

I just wasn't too sure if this was correct behaviour, so seeing it confirmed will have me implement this little bit of behaviour.

ClearlyClaire · 2025-01-22T13:17:58Z

Afaik that document is technically correct JSON-LD, the double-nested array is actually equivalent to a single array.

@ClearlyClaire Ah, then I know a change to add; .flattening the array, and picking the first element.

I just wasn't too sure if this was correct behaviour, so seeing it confirmed will have me implement this little bit of behaviour.

Yes, I suppose this would be a good change here! Though like I said earlier, I'm not sure whether this would always be safe (but reasonably, we wouldn't expect the attributes names we use to map to something completely different, so that should be fine)

ShadowJonathan added 9 commits January 19, 2025 13:03

change first_of_value to first_of_hash, and add more test cases

17ab4fa

please the cop

e18f752

move first_of_hash test to link_details_extractor_spec.rb

c284a7c

add dummy html variable

b00f273

move tests to embedded describe section

d1e9c9f

RSpec.describe -> describe

71b3f42

rework tests to public interface instead

218ab3c

fix tests to use empty strings

65607ed

be -> eq, since strings have different object_id's

f6f7d31

mjankowski added the ruby Pull requests that update Ruby code label Jan 20, 2025

ClearlyClaire reviewed Jan 21, 2025

View reviewed changes

add array flattening

4daeaef

ShadowJonathan requested a review from ClearlyClaire January 22, 2025 13:51

ClearlyClaire approved these changes Jan 22, 2025

View reviewed changes

ClearlyClaire enabled auto-merge January 22, 2025 14:00

ClearlyClaire added this pull request to the merge queue Jan 22, 2025

Merged via the queue into mastodon:main with commit b18caff Jan 22, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `first_of_value` to `first_of_hash`, and add more test cases #33647

Change `first_of_value` to `first_of_hash`, and add more test cases #33647

ShadowJonathan commented Jan 19, 2025 •

edited

Loading

ClearlyClaire left a comment

ShadowJonathan commented Jan 21, 2025

ClearlyClaire commented Jan 22, 2025

ShadowJonathan commented Jan 22, 2025

ClearlyClaire commented Jan 22, 2025

Change first_of_value to first_of_hash, and add more test cases #33647

Change first_of_value to first_of_hash, and add more test cases #33647

Conversation

ShadowJonathan commented Jan 19, 2025 • edited Loading

ClearlyClaire left a comment

Choose a reason for hiding this comment

ShadowJonathan commented Jan 21, 2025

ClearlyClaire commented Jan 22, 2025

ShadowJonathan commented Jan 22, 2025

ClearlyClaire commented Jan 22, 2025

Change `first_of_value` to `first_of_hash`, and add more test cases #33647

Change `first_of_value` to `first_of_hash`, and add more test cases #33647

ShadowJonathan commented Jan 19, 2025 •

edited

Loading