-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve spawning of supervisor worker tasks #1656
Improve spawning of supervisor worker tasks #1656
Conversation
197135d
to
0ed7526
Compare
I have created #1664 to show the work in progress to handle the client expiry error in workers. Unfortunately it seems like the refactoring in this PR would not be sufficient to handle the error, as the connection and channel workers are spawned with one per chain, rather than one per IBC client. Without further re-architecture we are face with the difficult choice of either keep looping to handle expired clients, or abort entirely and not able to handle any other new connections or channels on the chain. More details are discussed in #1664. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
7c1803a
to
aad0e00
Compare
@romac I have added some tests to make sure that the connection and channel workers are working as expected. The packet worker is already verified to be working, since it is needed for the IBC transfer tests to pass. |
# v0.10.0 January 13th, 2021 This release notably updates the underlying CLI framework (`abscissa`) to version 0.6.0-beta.1, which now uses `clap` for parsing command line arguments. This substantially improves the UX of the CLI, by adding support for `--help` flags in subcommands and improving help and usage printouts. The `--version` option of the `create channel` subcommand has been renamed to `--channel-version`, with the old name still supported as an alias. Additionally, the `-h` short flag on many commands is now `-H` to avoid clashes with the clap-provided short flag for help. This release also improves the handling of account sequence mismatch errors, with a recovery mechanism to automatically retry or drop tx upon such errors. The relayer now also supports dynamic versions in channel open handshake (which is needed by Interchain Accounts), and enables full support for IBC v2. --- * Update package versions from v0.9.0 to v0.10.0 * Add changelog for #1656 * Bump `ibc-proto` version to 0.14.0 * Update guide wrt --help and --channel-version * Disambiguate between help and height flags by using `-H` for the latter * Update ibc-proto doc(html_root_url) * Remove outdated comment * Fix broken link in changelog * Rename application-handled -h CLI flags to -H (#1743) * Disambiguate between help and height flags by using `-H` for the latter * Enable clap-provided help flags on all subcommands Since all application-assigned short -h options have been renamed to -H, there is no need to suppress the -h flags provided by clap with the DisableHelpFlag setting. * Update changelog for #1743 * Remove a FIXME comment Resolved by e59bb13 Co-authored-by: Romain Ruetschi <romain@informal.systems> * guide: Removed wording about missing -h/--help The -h flags have been freed for built-in clap use and are supported on all subcommands. * Fix link to packet filtering policy in config page * Release changelog for 0.10.0 * Update changelog summary Co-authored-by: Romain Ruetschi <romain@informal.systems> Co-authored-by: Shoaib Ahmed <sufialhussaini@gmail.com> Co-authored-by: Mikhail Zabaluev <mikhail@informal.systems>
* Improve spawning of supervisor worker tasks * Use better names for worker tasks * Add integration tests for connection and channel workers * Fix typo * Reorder arguments in assert_eventually_succeed
# v0.10.0 January 13th, 2021 This release notably updates the underlying CLI framework (`abscissa`) to version 0.6.0-beta.1, which now uses `clap` for parsing command line arguments. This substantially improves the UX of the CLI, by adding support for `--help` flags in subcommands and improving help and usage printouts. The `--version` option of the `create channel` subcommand has been renamed to `--channel-version`, with the old name still supported as an alias. Additionally, the `-h` short flag on many commands is now `-H` to avoid clashes with the clap-provided short flag for help. This release also improves the handling of account sequence mismatch errors, with a recovery mechanism to automatically retry or drop tx upon such errors. The relayer now also supports dynamic versions in channel open handshake (which is needed by Interchain Accounts), and enables full support for IBC v2. --- * Update package versions from v0.9.0 to v0.10.0 * Add changelog for informalsystems#1656 * Bump `ibc-proto` version to 0.14.0 * Update guide wrt --help and --channel-version * Disambiguate between help and height flags by using `-H` for the latter * Update ibc-proto doc(html_root_url) * Remove outdated comment * Fix broken link in changelog * Rename application-handled -h CLI flags to -H (informalsystems#1743) * Disambiguate between help and height flags by using `-H` for the latter * Enable clap-provided help flags on all subcommands Since all application-assigned short -h options have been renamed to -H, there is no need to suppress the -h flags provided by clap with the DisableHelpFlag setting. * Update changelog for informalsystems#1743 * Remove a FIXME comment Resolved by e59bb13 Co-authored-by: Romain Ruetschi <romain@informal.systems> * guide: Removed wording about missing -h/--help The -h flags have been freed for built-in clap use and are supported on all subcommands. * Fix link to packet filtering policy in config page * Release changelog for 0.10.0 * Update changelog summary Co-authored-by: Romain Ruetschi <romain@informal.systems> Co-authored-by: Shoaib Ahmed <sufialhussaini@gmail.com> Co-authored-by: Mikhail Zabaluev <mikhail@informal.systems>
Description
This is a prerequisite for #1543 to make the spawning of supervisor worker tasks more manageable by parallelizing unrelated tasks.
A key challenge of the current design of the supervisor worker tasks is that we are relying on ad hoc multiplexing of multiple unrelated sub-tasks in a single thread. This result in various unintuitive hacks and bottlenecks:
.shutdown()
will cause the task to dangle.For the case of #1543, the main challenge is in multiplexing the client refresh and the detect misbehavior sub-tasks. When the client trusting period is reduced to tens of seconds, the misbehavior sub-task may interfere with the refresh sub-task to be able to refresh on time. Furthermore, in the case when a client is in fact frozen, it is difficult to modify the current code to abort that particular sub-task.
This PR introduce the
ibc_relayer::task
module, which encapsulate the logic of spawning background tasks and return aTaskHandle
. TheTaskHandle
provides methods to shutdown a background task, or wait for the task to terminate. TheTaskHandle
also implements aDrop
handler that always shutdown a background task when it is dropped, preventing dangling tasks.The
spawn_background_task
function accepts a stepper closure that is executed repeatedly, until the shutdown signal is received or until the stepper closure return abort signal or fatal error. The stepper closure returns aTaskError<E>
, which wraps around an error typeE
and contain 3 variants to indicate whether to abort, continue with an ignorable error, or abort with a fatal error. This design significantly simplifies how a step function should be implemented, without having to worry about the low level details of managing a background task.The supervisor code and its workers are refactored to use the new
task
module. The various sub-tasks are also split into dedicated tasks to allow them to run in parallel without interfering with each others. Due to the use of shared state in the existing code, some mutexes are introduced so that the same shared state can be shared across multiple parallel tasks.This PR does not introduce any change in logic aside from refactoring the supervisor code to use the
task
module. The work for fixing #1543 would instead be done in a separate PR that is based on this PR.PR author checklist:
unclog
.docs/
).Reviewer checklist:
Files changed
in the GitHub PR explorer.