Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(connlib): discard timer once it fired #7288

Merged
merged 4 commits into from
Nov 8, 2024

Conversation

thomaseizinger
Copy link
Member

Within connlib, we have many nested state machines. Many of them have internal timers by means of timestamps with which they indicate, when they'd like to be "woken" to perform time-related processing. For example, the Allocation state machine would indicate with a timestamp 5 minutes from the time an allocation is created that it needs to be woken again in order to send the refresh message to the relay.

When we reset our network connections, we pretty much discard all state within connlib and together with that, all of these timers. Thus the poll_timeout function would return None, indicating that our state machines are not waiting for anything.

Within the eventloop, the most outer state machine, i.e. ClientState is paired with an Io component that actually implements the timer by scheduling a wake-up aggregated as the earliest point of all state machines.

In order to not fire the same timer multiple times in a row, we already intended to reset the timer once it fired. It turns out that this never worked and the timer still lingered around.

When we call reset, poll_timeout - which feeds this timer - returns None and the timer doesn't get updated until it will finally return Some with an Instant. Because the previous timer didn't get cleared when it fired, this caused connlib to busy loop and prevent some(?) other parts of it from progressing, resulting in us never being able to reconnect to the portal. Yet, because the event loop itself was still operating, we could still resolve DNS queries and such.

Resolves: #7254.

Copy link

vercel bot commented Nov 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 8, 2024 0:05am

@thomaseizinger thomaseizinger added this pull request to the merge queue Nov 8, 2024
Merged via the queue into main with commit 8653146 Nov 8, 2024
108 checks passed
@thomaseizinger thomaseizinger deleted the fix/reset-timer-no-timeout branch November 8, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows 1.3.10 fails to establish connections to CIDR and DNS resources
2 participants