Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle large chain calc better #17291

Merged
merged 4 commits into from
Jun 19, 2024
Merged

Conversation

erikjohnston
Copy link
Member

We calculate the auth chain links outside of the main persist event transaction to ensure that we do not block other event sending during the calculation.

We calculate the auth chain links outside of the main persist event
transaction to ensure that we do not block other event sending during
the calculation.
@erikjohnston erikjohnston marked this pull request as ready for review June 10, 2024 16:24
@erikjohnston erikjohnston requested a review from a team as a code owner June 10, 2024 16:24
@morguldir
Copy link
Contributor

morguldir commented Jun 17, 2024

  • Before:
    matrix-synapse-generic-worker-8fc645778-v4cv7 generic-worker 2024-06-17 16:12:35,614 - synapse.access.http.8083 - 473 - INFO - PUT-462- <ip> - 8083 - {@morguldir:sulian.eu} Processed request: 302.537sec/0.000sec (0.419sec, 0.052sec) (297.116sec/1.248sec/19) 59B 200 "PUT /_matrix/client/v3/rooms/!OGEhHVWSdvArJzumhm%3Amatrix.org/send/m.reaction/m1718640453033.4 HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36.0 (KHTML, like Gecko)
  • After:
    matrix-synapse-generic-worker-6449f78659-jgl26 generic-worker 2024-06-17 16:23:28,971 - synapse.access.http.8083 - 473 - INFO - PUT-131- <ip> - 8083 - {@morguldir:sulian.eu} Processed request: 271.939sec/0.140sec (0.415sec, 0.078sec) (13.365sec/15.000sec/19) 59B 200 "PUT /_matrix/client/v3/rooms/!OGEhHVWSdvArJzumhm%3Amatrix.org/send/m.reaction/m1718641136803.3 HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0" [3 dbevts]
    Chrome/124.0.6367.243 Safari/537.36.0" [3 dbevts]`
  • With Partial revert #17044 to fix sql performance regression #17154
    matrix-synapse-generic-worker-685b6d5b8-g9p97 generic-worker 2024-06-17 16:43:04,772 - synapse.access.http.8083 - 473 - INFO - PUT-626- <ip> - 8083 - {@morguldir:sulian.eu} Processed request: 9.251sec/0.001sec (0.276sec, 0.040sec) (11.001sec/2.227sec/20) 59B 200 "PUT /_matrix/client/v3/rooms/!OGEhHVWSdvArJzumhm%3Amatrix.org/send/m.reaction/m1718642575413.4 HTTP/1.1" "Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0" [3 dbevts]

So this doesn't seem to have helped too much in this case, example is from HQ

I keep seeing this in my logs as well, but not sure if it's relevant
matrix-synapse-generic-worker-685b6d5b8-g9p97 generic-worker 2024-06-17 16:42:57,777 - synapse.storage.databases.main.event_federation - 274 - INFO - GET-19- Unexpectedly found that events don't have chain IDs in room !OGEhHVWSdvArJzumhm:matrix.org: {'$e3j8FpK-lcibZO57q50gANeLSK0dptpF6fCUmBF3rKU', '$7_ZXqJstZZrXU1S-2jgKaEOpgrvWEXdU_NxQ6mxC4_8', '$QDzjkSbQgHTZPi8G2ZnzQ6TpUVmzx1Xf7wmbmWbUqWM'}

@erikjohnston
Copy link
Member Author

Hi @morguldir, yup, this PR won't cause the event itself to get persisted faster. Instead it ensures other events (in other rooms) don't get blocked from being sent down to the client.

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small optimisation and a couple documentation changes, otherwise LGTM!

synapse/storage/databases/main/events.py Show resolved Hide resolved
synapse/storage/databases/main/events.py Show resolved Hide resolved
@@ -100,6 +99,14 @@ def is_noop(self) -> bool:
return not self.to_delete and not self.to_insert and not self.no_longer_in_room


@attr.s(slots=True, auto_attribs=True)
class NewEventChainLinks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring to this object? I don't think it's particularly grokkable at a glance without going and seeing how it's used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggled with the wording, LMK if that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to re-read a couple times, but it's understandable 👍

It's certainly a fairly abstract concept to describe without a diagram.

synapse/storage/databases/main/events.py Show resolved Hide resolved
@@ -100,6 +99,14 @@ def is_noop(self) -> bool:
return not self.to_delete and not self.to_insert and not self.no_longer_in_room


@attr.s(slots=True, auto_attribs=True)
class NewEventChainLinks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to re-read a couple times, but it's understandable 👍

It's certainly a fairly abstract concept to describe without a diagram.

@erikjohnston erikjohnston merged commit bdf82ef into develop Jun 19, 2024
38 checks passed
@erikjohnston erikjohnston deleted the erikj/calculate_auth_chain branch June 19, 2024 09:33
erikjohnston added a commit that referenced this pull request Jun 19, 2024
erikjohnston added a commit that referenced this pull request Jun 19, 2024
This reverts commit bdf82ef  (#17291)

This seems to have stopped persisting auth chains for new events, and so
is causing state res to fall back to the slow methods
erikjohnston added a commit that referenced this pull request Jun 20, 2024
We calculate the auth chain links outside of the main persist event
transaction to ensure that we do not block other event sending during
the calculation.
erikjohnston added a commit that referenced this pull request Jun 24, 2024
This is #17291 (which got reverted), with some added fixups, and change
so that tests actually pick up the error.

The problem was that we were not calculating any new chain IDs due to a
missing `not` in a condition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants