-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[move] Implement and enable on-chain package hooks #19613
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
d5bca38
to
9564ae0
Compare
9564ae0
to
2bf9961
Compare
2bf9961
to
f75a20f
Compare
@amnn I have rebased and fixed clippy, test-extra, and cargo-deny (advisory, fixed by rebase). I can't reproduce TS e2e tests and split cluster check (all pass on my machine) and cargo test (can't reproduce -- consistently have other failures locally even on trunk). |
Thanks @kklas, some of these tests may be flaky -- I've triggered another run, and we can try retrying them to see. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a chance to take a high level look at the PR @kklas, thanks for this.
The main thing I'm thinking/worrying about is this whole async
story (I'm sure it's the same for you). I have a mental model for how I think things "should" work, and I'm curious to understand from you where it falls apart:
I was hoping that for the most part, the fact that a package was being resolved using the package hooks would be hidden from the rest of the system, because we would fetch the packages during dependency resolution, and then from that point on, we refer to it as a file location. Based on that, outside of dependency resolution, we don't have to worry about this async API complication.
Maybe this is true and the issue is that we need to dependency resolution in more places. If so, there are some tricks we can use to make the async API a bit more palatable. For example, we can implement the hook using channels:
- When you register the hook, you spin up a separate task that handles requests from the hook.
- It talks over channels, so it waits for a request on one channel, does the work (fetching and writing packages), and then returns on another channel.
- The hook implementations themselves, can be non-async, communicating over the channels using their blocking APIs.
Code organisation-wise, we will want to move all this hook stuff out into its own crate, because the sui
crate is quite large, and we don't want to introduce it as a dependency to a bunch of places if we don't have to. The package hook's dependencies seem quite modest, so it would be fine to include just that.
@amnn my understanding is this: I agree we shouldn't have to worry about it, and we don't if the other code that depends on the hooks is not aysnc / concurrent. But if it is then it's an issue because what you're trying to do is run blocking code (because all package hooks functions are non async) that then encapsulates some async code that's IO bound and doing http requests basically.
So this is the conundrum and I don't think using channels does anything fundamentally to alleviate that but would rather create more issues -- e.g. the "build dependency graph" call is sync, which will then call into a package hook which will send something to a channel and block waiting for the response. This can cause a deadlock because the hook task might be running on the same thread as the one which is handling the requests from the hooks (and if you want to use tokio channels instead of std then we're back at square one because you can't await in the hook). Seems like an anti-pettern to use channels for this and more complex in general. After thinking more about this, I can come up with another solution that works:
So instead of trying to use the existing runtime with
The downside though is the cost of spawning new threads but I guess we can consider this similar to the overhead of running an external resolver command which is a separate process. Ultimately the issue IMO is the design of the dependency graph. If we want to be able to resolve packages asynchronously, which we do basically because all the sui client that we need are async, the dependency graph needs to support that natively. So with all of this separate thread creation etc. we're just dancing around this issue. Not sure behind the design decision to not have async in the dependency graph but maybe it's time to revisit that. Another issue is PackageHooks are global once registered which also creates issues in concurrent scenarios if you want to run it for different settings concurrently.
Makes sense! Will do that! |
When I originally mentioned using channels, what I had in mind was the following, which I think gets around your concern around the task being potentially scheduled on the same thread as the caller, and your other concern about creating lots of short-lived threads:
You can simulate a function call with channels using a pattern like this: async fn foo(x: Param) -> Ret {
/* ... */
bar
} Becomes (roughly -- I haven't run this): static ACTOR: OnceCell<Actor> = OnceCell::new();
struct Actor {
call: mpsc::Sender<(Param, oneshot::Receiver<Ret>)>,
rt: Runtime,
}
fn foo(x: Param) -> Result<Ret> {
let call = CALL.get().clone();
let (tx, rx) = oneshot::channel();
blocking_send((x, tx))?;
rx.blocking_recv()
}
fn register_hook() {
let rt = Builder::new_multi_thread()
.enable_all()
.build()
.expect("Couldn't set-up runtime for actor");
// Support some moderate level of concurrency (buffer for four
// requests).
let (tx, rx) = mpsc::channel(4);
// NB. This will complain if you register the hook twice, so test
// scenarios may be complicated.
ACTOR.set(Actor { call: tx, rt }).expect("Failed to set-up actor");
rt.spawn(async move {
loop {
let (x, tx) = rx.recv().await.expect("tx can't close");
/* ... */
// Ignore the error from this -- it means the rx side was dropped,
// but that is not this actor's concern.
let _ = tx.send(bar).await;
}
});
} Tokio has some resources on this problem: https://tokio.rs/tokio/topics/bridging -- this approach is the one suggested here:
But to your point here:
Yes, we do need to redesign dependency resolution with all its new constraints in mind. Package Hooks predate Sui, which uses quite a different philosophy to packages compared to what came before it. Once MVR and this work have landed, I'm hoping we can pick up that redesign work internally. |
f75a20f
to
a4036b4
Compare
@amnn and I continued the async discussion over another channel and agreed that the solution I implemented here is a reasonable one considering the trade-offs. |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Description
This PR implements and enables on-chain package hooks which makes it possible to build against on-chain dependencies specified in the manifest.
The hooks are implemented as follows:
resolve_on_chain_dependency
:~/.move/<id>
by generating a manifest and saving the modulescustom_resolve_pkg_id
:original-published-id
for the current envenv
entry, we try to resolve by looking up thepublished-at
field in the manifest -- the package is fetched and theid
is resolved as the self id of any module in that package (this is also how the on-chain packages generated withresolve_on_chain_dependency
are resolved, with caching to avoid duplicate fetching)published-id
field either, we assume the package hasn't been published and resolve theid
as its manifest namecustom_resolve_version
:version
field inenv
(in case of lockfile) or theversion
value from the fetched package (in case ofpublished-at
)None
(see this for more details https://github.com/MystenLabs/sui/blob/main/external-crates/move/crates/move-package/src/resolution/dependency_graph.rs#L135-L152)Since packages are now resolved by their
(original-id, version)
pair instead of manifest name, this effectively makes it possible to consolidate both source and on-chain dependencies under a single dependency graph. The reasoning and principles behind this design are outlined here https://gist.github.com/kklas/d9edad7875548fae57b0c86bc8871feacc @rvantonder @amnn #14178
Notes and known limitations
The async stuff in
SuiPackageHooks
is a bit hazardous. The gist of it is that the hook methods are required to be non-async but we need to use theSuiClient
within them both for fetching the on-chain packages and resolving package version an ids based onpublished-at
. The way I got around it is that I run the future intokio::task::block_in_place
which will inform the existing executor (in case we're already running within tokio) about the blocking task so it hands of other any tasks to another thread. Then within it we dispatch the future in a separate runtime that we control. This works and it will not starve other tasks but it can lead to weird behaviour, slowness, or even deadlocks in some cases when multiple pieces of code are running concurrently within the same task (which is facilitated by macros likejoin!
andselect!
). I took a quick look at the codebase and it seems to me it won't be an issue because the sui CLI doesn't use that kind of concurrency for commands, thesui-move-lsp
doesn't use tokio / async, andsui-source-validation-service
runs all build tasks usingtokio::spawn
which will be OK. Nonetheless I think the hooks should be refactored to be async ASAP (ideally make them not be global static also)!Since the packages are now identified by their
(original-id, version)
, the dependency graph section in the lockfiles is now env specific. This means that if the lockfile was generated for the testnet env, having that package as a dependency and building on the mainnet will lead to inconsistencies as testnet ids will be loaded into the graph. Perhaps there should be an optionalchain_id
field in the lockfile so that it can be ignored when there's a mismatch?When using on-chain dependencies it's not possible to specify which env they correspond to, so these manifests will only work for a single env.
move-analyzer
(to my surprise) works with bytecode deps (based on a smaller project I tested it on, it could have issues elsewhere), but it's not possible to change theenv
after it starts. Ideally it would pick up changes fromWalletContext
or have a setting for selecting theenv
in the editor menu.Test plan
Added an integration test, tested manually on large-ish projects.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.
Move.toml
(e.g.Foo = { id = "0x123" }
).