Skip to content

Clean up endpoint regeneration lifecycle #37209

Open
@squeed

Description

The endpoint regeneration mechanism is a bit of a mess.

Regeneration is, naturally, serialized; we should not regenerate a given endpoint in parallel. Okay, fine; there is a queue with a single consumer.

However, this is still awkward, as this prevents multiple pending regeneration requests from being coalesced. Incorrectly, if we coalesce multiple regeneration requests, only the first caller is blocked until completion; all other callers return immediately. This is particularly bad, since those callers may erroneously expect that regeneration has succeeded.

Even more awkward, if regeneration fails, we trigger a separate controller to re-enqeue yet another request.

The Fix

  1. We should not use the endpoint's EventQueue for handling regeneration. Rather, we should have a single controller that handles all regeneration requests.
  2. All regeneration requests should be coalesced. All regeneration requests should be blocking.
  3. The desiredRegenerationLevel mechanism should be persisted.
  4. RegenMetadata.ParentContext should be removed. It is extremely error prone.
  5. If regeneration fails, it should be retried with some backoff.

Additional cleanup:

  • Document exactly what the different regeneration levels do
  • Decide when we should and should not force policy recalculation. I suspect we should be more rigorous here
  • Consider making forced policy calculation another level between userspace and datapath
  • Audit metrics, see if they're lacking

Metadata

Assignees

Labels

help-wantedPlease volunteer for this by adding yourself as an assignee!kind/cleanupThis includes no functional changes.sig/policyImpacts whether traffic is allowed or denied based on user-defined policies.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions