fix(connlib): discard timer once it fired #7288

thomaseizinger · 2024-11-08T04:48:50Z

Within connlib, we have many nested state machines. Many of them have internal timers by means of timestamps with which they indicate, when they'd like to be "woken" to perform time-related processing. For example, the Allocation state machine would indicate with a timestamp 5 minutes from the time an allocation is created that it needs to be woken again in order to send the refresh message to the relay.

When we reset our network connections, we pretty much discard all state within connlib and together with that, all of these timers. Thus the poll_timeout function would return None, indicating that our state machines are not waiting for anything.

Within the eventloop, the most outer state machine, i.e. ClientState is paired with an Io component that actually implements the timer by scheduling a wake-up aggregated as the earliest point of all state machines.

In order to not fire the same timer multiple times in a row, we already intended to reset the timer once it fired. It turns out that this never worked and the timer still lingered around.

When we call reset, poll_timeout - which feeds this timer - returns None and the timer doesn't get updated until it will finally return Some with an Instant. Because the previous timer didn't get cleared when it fired, this caused connlib to busy loop and prevent some(?) other parts of it from progressing, resulting in us never being able to reconnect to the portal. Yet, because the event loop itself was still operating, we could still resolve DNS queries and such.

Resolves: #7254.

vercel · 2024-11-08T04:48:53Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
firezone	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 8, 2024 0:05am

Clear timer after it fired

9cedf62

thomaseizinger requested a review from jamilbk November 8, 2024 04:48

Add changelog entries

19871b9

vercel bot deployed to Preview November 8, 2024 04:51 View deployment

thomaseizinger enabled auto-merge November 8, 2024 04:53

jamilbk approved these changes Nov 8, 2024

View reviewed changes

thomaseizinger disabled auto-merge November 8, 2024 04:58

Add unit test

77bd6b8

thomaseizinger enabled auto-merge November 8, 2024 05:05

vercel bot deployed to Preview November 8, 2024 05:06 View deployment

Merge branch 'main' into fix/reset-timer-no-timeout

7771d40

vercel bot deployed to Preview November 8, 2024 12:05 View deployment

thomaseizinger added this pull request to the merge queue Nov 8, 2024

Merged via the queue into main with commit 8653146 Nov 8, 2024
108 checks passed

thomaseizinger deleted the fix/reset-timer-no-timeout branch November 8, 2024 12:33

thomaseizinger mentioned this pull request Nov 9, 2024

Option::as_mut followed by Option::take is a foot-gun rust-lang/rust-clippy#13671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(connlib): discard timer once it fired #7288

fix(connlib): discard timer once it fired #7288

thomaseizinger commented Nov 8, 2024

vercel bot commented Nov 8, 2024 •

edited

Loading

fix(connlib): discard timer once it fired #7288

fix(connlib): discard timer once it fired #7288

Conversation

thomaseizinger commented Nov 8, 2024

vercel bot commented Nov 8, 2024 • edited Loading

vercel bot commented Nov 8, 2024 •

edited

Loading