-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix worker event loop ref/unref + leak #4114
Conversation
src/bun.js/module_loader.zig
Outdated
worker.queueInitialTask(); | ||
} | ||
} | ||
// defer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker.queueInitialTask was removing an eventloop ref way too early. since the ref was now based on the worker's event loop, the initial task to unref isn't needed i think. did not mean to leave as commented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code here was removed instead of commented
src/bun.js/web_worker.zig
Outdated
continue; | ||
while (true) { | ||
while (vm.eventLoop().tasks.count > 0 or vm.active_tasks > 0 or vm.uws_event_loop.?.active > 0) { | ||
this.eventloop_poll_ref.refConcurrently(this.parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant to have this at the end of the while(true) (didnt push that commit because was looking into other bugs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not call .refConcurrently() on every tick of the event loop? It should only need to call it once while there are pending tasks.
src/bun.js/web_worker.zig
Outdated
this.onTerminate(); | ||
this.eventloop_poll_ref.unrefConcurrently(this.parent); | ||
vm.eventLoop().tickPossiblyForever(); | ||
this.parent.eventLoop().wakeup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a race condition here where the parent event loop is unref'd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the case where the parent event loop is not the main thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i did not think about workers starting other workers
❌ @paperdave 2 files with test failures on bun-darwin-aarch64:
|
❌ @paperdave 1 files with test failures on linux-x64:
|
❌ @paperdave 1 files with test failures on linux-x64-baseline:
|
❌ @paperdave 4 files with test failures on bun-darwin-x64-baseline:
|
d6cf039
to
e9733e6
Compare
src/bun.js/base.zig
Outdated
@@ -2122,6 +2122,11 @@ pub export fn MarkedArrayBuffer_deallocator(bytes_: *anyopaque, _: *anyopaque) v | |||
// zig's memory allocator interface won't work here | |||
// mimalloc knows the size of things | |||
// but we don't | |||
if (comptime Environment.allow_assert) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[existing bug reproduced by sveltekit build. this assertion makes it fail consistantly]
@@ -82,5 +82,3 @@ import { build, buildSync, transform, transformSync } from "esbuild"; | |||
throw new Error("Test failed."); | |||
} | |||
} | |||
|
|||
process.exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably was not needed, but a reason to keep this is that the process should exit since there are no ref'd polls.
bug with |
This PR adds documentation about compiling a specific version of ZLS that works better which is after the Install zig part of the docs. But the version of zig installed by the install zig section (0.11.0-dev.4004+a57608217) is too old to compile the specific version of ZLS (v0.11.0-dev.4332+43b830415 is required). Perhaps its worth adding something that mentions this if this is intended. 78defe7#diff-f0c08b47e6be8e8b574a71f652c3c2c66f49d157b392f3091c58e8585d411010R183-R202 |
What does this PR do?
Fixes worker tests but it makes these changes to WebWorker.
was reimplemented a little on aug 11:
.ref()
, but workers will exit if their event loop dies.addEventListener("message")
on a message port or the global, this refs the event loop.worker.unref()
this no longer makes the worker's event loop affect the caller's. This is what esbuild depends on.I solved it by assigning vm to null when the vm is done with, then the while(true) loop for the worker (the source of the UAF) would call real deinit.It's not really that complex with the changes to the worker event loop..terminate
sets a flag to tell the worker to exit the event loop, and the deinit is called from withinspin()
on the Worker's thread.I'd like a close review to make sure i did everything right. I triple checked my uses of event loop refs and memory usage in the worker zig file.
How did you verify your code works?
I ran unit tests. For testing large scale worker event loop stuff.
Pipe this to a file to see that exactly 1000 lines are output.