Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: blank worker #6911

Open
wanderview opened this issue Jul 28, 2021 · 41 comments
Open

Proposal: blank worker #6911

wanderview opened this issue Jul 28, 2021 · 41 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: workers

Comments

@wanderview
Copy link
Member

Blank Worker Explainer

Introduction

The web platform currently requires DedicatedWorker and SharedWorker scripts to be same-origin to the parent context creating them. This is largely motivated by the desire to avoid some of the issues associated with the creation of cross-origin iframes.

This restriction, however, creates a common headache for web developers. They often have scripts hosted on cross-origin CDNs. They cannot directly use these scripts to create a DedicatedWorker or SharedWorker. Instead they must use a workaround like:

const blob = new Blob(['importScripts("https://cdn.example/my/worker/script.js")'],
                      { type: 'text/javascript' });
const blobURL = URL.createObjectURL(blob);
const worker = new Worker(blob);

This works, but it is a persistent paper cut for web developers. It makes something that should be easy, complicated and non-obvious. It also risks leaking the blob URL if the code does not later call revokeObjectURL(). It also invokes a lot of complicated machinery in the browser to persist and load the blob. This overhead should not be necessary.

This effort proposes to improve the situation by providing two features that are available in iframes, but missing in DedicatedWorker and SharedWorker today:

  1. The ability to create a blank context.
  2. The ability to append scripts to an existing context.

With this proposal to provide these features, the example above could instead be written:

const worker = new Worker();
worker.executeScript('https://cdn.example/my/worker/script.js');

Goals

  • Provide a more convenient way to construct DedicatedWorker and SharedWorker threads using cross-origin scripts.
  • Allow more easily running multiple scripts in a single worker, from the "outside" (without needing a custom postMessage signaling solution)
  • Better align worker and iframe behavior and infrastructure.

Non-Goals

  • This effort is explicitly not interested in creating cross-origin DedicatedWorker or SharedWorker contexts where their self.origin differs from their owner's self.origin.

Web APIs

This proposal includes two distinct API changes. In theory these are somewhat orthogonal, but we need both to address the motivating use case.

Blank Worker Construction

This API change simply provides a default constructor that has no script URL argument. So:

const w = new Worker();
const sw = new SharedWorker({ name: 'foo' });

Workers constructed in this way have a script URL of about:blankjs. The origin, policy container, service worker controller, etc of the owner are inherited by the worker context just as a child about:blank iframe inherits them from its parent. The about:blankjs resource will be considered to have an text/javascript mime type while about:blank has a text/html mime type.

Owner Initiated Script Execution

This API change proposes to allow the owning context to initiate script execution in the worker context.

const w = new Worker();
await w.importScripts(scriptURL);

This API could also support running modules:

const w = new Worker({ type: 'module' });
await w.addModule(scriptURL);

Alternatively we could instead expose a single w.executeScripts(url, { type }) method.

These methods would act as if they sent a postMessage() to the worker which then invoked importScripts() or addModule() in the worker context. It would then postMessage() back to the owning context, indicating that the script execution was completed. This would then resolve the promise returned from w.executeScripts().

Notably, this postMessage()-like behavior means that multiple calls to executeScript() would be queued. Modules that use top-level await could interleave, but otherwise all scripts would run in the order they were sent.

Considered Alternatives

The main alternative that is typically suggested is to simply allow new Worker() and new SharedWorker() to take cross-origin scripts. We don't want to do this for a couple of reasons.

First, we don't want to support cross-origin workers at the moment. We are still dealing with the long tail of consequences of allowing cross-origin iframes. If necessary, code can construct a cross-origin iframe which can then create its own worker.

Second, we don't want to support cross-origin scripts while keeping the worker same-origin to its owner because it would create a very exceptional loading situation. Today all contexts and javascript globals have an origin that matches the origin of their loading resource. Breaking this constraint would create an exceptional case in the browser which could lead to unexpected security issues.

Privacy & Security Considerations

This proposal does not store any user data or expose any information about the client to the server. It's mainly an ergonomic API change for something that is already achievable through the blob API. There should not be any privacy impact from this proposal.

In terms of security, however, there may be a few items to discuss.

First, it may be controversial to create a new special URL type like about:blankjs. One could argue we should instead use about:blank itself. That would be problematic, however, since about:blank has a text/html mime type. In addition, about:blank has numerous weird behaviors (initial about:blank, replacement, fragments, etc) that will not be supported in about:blankjs. We do not want to propagate these unusual features to workers and it would be another weirdness for about:blank to work inconsistently.

Second, it is possibly concerning that the owner can inject script into the worker at any time. This would be a new capability that existing scripts may not be expecting. We argue, however, that the owner/worker division is not a security boundary. The owner and worker already share storage, network cache, service workers, etc. There are many ways for the owner to attack the worker context if it wanted to.

In addition, it seems likely an owner could use blob URLs to construct the same behavior we are proposing here to inject script whenever it wants into a target worker thread, by executing a blob URL containing a script execution framework plus importScripts(originalURL), instead of by using new Worker(originalURL) directly. Same-origin scripts can potentially defend against this CSP, but again there are many other ways for the owner to attack the worker script via poisoned storage, cache, etc.

Acknowledgements

Thank you to @domenic and @surma for reviewing and contributing to this explainer.

@developit
Copy link

FWIW I don't think w.executeScripts(url, { type }) would be compatible with the existing constraints on Workers: Workers are instantiated either with type:module or as classic workers, and that affects their WorkerGlobalScope - classic workers have access to importScripts(), whereas Module Workers do not - moving the script type to a later stage doesn't allow for that differentiation.

One thing I didn't spot in the proposal - IIRC cross-origin works fine via CORS for module workers today, so the pain point here is limited to classic workers?

@wanderview
Copy link
Member Author

I don't think the top level script, whether classic or module, can be cross-origin to the owning context.

@domenic
Copy link
Member

domenic commented Jul 28, 2021

FWIW I don't think w.executeScripts(url, { type }) would be compatible with the existing constraints on Workers: Workers are instantiated either with type:module or as classic workers, and that affects their WorkerGlobalScope - classic workers have access to importScripts(), whereas Module Workers do not - moving the script type to a later stage doesn't allow for that differentiation.

This is a good point. I guess it argues for a unified API like w.todoComeUpWithName(url) which treats the script as a module script for module workers and a classic script for classic workers.

Naming considerations:

  • worker.addScript() seems bad because of a false comparison with worklet.addModule(). worker.addScript() would make people think maybe it's only classic scripts.

  • worker.importScript() or importScripts() seems bad because workerGlobalScope.importScripts() only works for classic scripts, whereas worker.todoComeUpWithName() has varying behavior.

  • Maybe we could get away without a noun at all, e.g. worker.execute(url) or worker.run(url).

  • worker.import(url) is a bit confusing but in both directions: it is reminiscent of (classic-only) importScripts() and (module-only) import(). Maybe that balances it out? Or maybe it's just bad.


An alternate direction would be to only support modules. I guess you would say new Worker() with no args is always module; there's no type option at all. Then you would use the worklet-symmetric name addModule(). This might make things harder to migrate though, from existing blob URL/importScripts-based solutions.

@annevk annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: workers labels Jul 29, 2021
@annevk
Copy link
Member

annevk commented Jul 29, 2021

  1. Do we need to reuse Worker and SharedWorker or could we also create something new? Service worker interception would work differently for this as well, for instance, so I'm not sure that overloading is the way to go for what seems to be a completely novel type of worker.
  2. Assuming we can create something new, should we also look at the kind of things @surma was playing with for communication? Or at least ensure we do not prevent future additions along those lines?

I don't really understand how

Better align worker and iframe behavior and infrastructure.

works given that the origin would still be the origin of whoever created the worker. Or is this mainly about the about:blank behavior and that you will have to inherit all the policies? (I guess it's nice we have policy container now, though I'm not sure it contains all the necessary policies yet.)

@surma
Copy link
Contributor

surma commented Jul 29, 2021

This is a good point. I guess it argues for a unified API like w.todoComeUpWithName(url) which treats the script as a module script for module workers and a classic script for classic workers.

import() is available in classic workers, while importScripts() is not available in module workers. So a classic worker can load both modules and classic scripts, while a module worker can only load modules. I feel like we should continue to give developers the ability to do either.

Would it be viable to give a module worker a different API than a classic worker? For example, classic workers having both importScripts and addModule, while module workers only have addModule? Alternatively, we could expose both and simply throw when trying to run a classic script in a module worker. Not sure if that is good design, though.

(Same applies to execute(url, {type}), I just prefer the dual method API.)

@surma
Copy link
Contributor

surma commented Jul 29, 2021

Do we need to reuse Worker and SharedWorker or could we also create something new? Service worker interception would work differently for this as well, for instance, so I'm not sure that overloading is the way to go for what seems to be a completely novel type of worker.

Are ServiceWorker reasonable to consider for an API like this? Their caching behavior is tied to scripts loaded in the first tick (isn’t it?), so my hunch is that the usefulness in a SW context would be quite limited.

Assuming we can create something new, should we also look at the kind of things @surma was playing with for communication? Or at least ensure we do not prevent future additions along those lines?

I’d love to help with this, although I feel like it’s orthogonal to this. Do you think it’s something that could succeed in standards space, considering that it can be solved in user land?

@annevk
Copy link
Member

annevk commented Jul 29, 2021

I'm not saying that it should be a service worker, but that the way service worker interception/selection works for dedicated and shared workers cannot work for a worker that is not started with a URL. (It would have to reuse the service worker from whoever created the instance probably.) This also goes for CSP, Referrer Policy, etc.

It's not clear to me we should continue to support classic scripts. Isn't that idea that everything new is module scripts?

As for the ergonomic API. I want to make sure that if we create something new we could add that later if desired. Also, there are a lot of things that can be done in user land, but have been added to standards to make things easier for developers. So yeah, I think it could succeed.

@RReverser
Copy link
Member

re:

Second, we don't want to support cross-origin scripts while keeping the worker same-origin to its owner because it would create a very exceptional loading situation.

I don't understand how this is different from the proposed explicit methods, which seem to do just that - load cross-origin scripts into a same-origin worker.

Instead of allowing empty worker + coming up with an extra method name, can we expose this as an option on existing Worker constructor like

new Worker('http://...', {
  type: 'module',
  crossOrigin: true // or 'anonymous' or something else
})

Functionally it would do the same as proposed methods, but seems more ergonomic than splitting up the Worker creating and script loading steps.

@annevk
Copy link
Member

annevk commented Jul 29, 2021

That would not work with service worker selection. Similar to navigation that relies on it being same origin.

@wanderview
Copy link
Member Author

wanderview commented Jul 29, 2021

Do we need to reuse Worker and SharedWorker or could we also create something new? Service worker interception would work differently for this as well, for instance, so I'm not sure that overloading is the way to go for what seems to be a completely novel type of worker.

I'm strongly opposed to creating a new worker type for this. I really don't even understand the motivation around that. The lifetime of the worker is unchanged by this proposal which seems to be the defining factor of the different worker types. (Single owner vs multiple owner vs background ephemeral.)

Also, I don't think service worker interception would work differently at all. It would be just like how an about:blank iframe works (in firefox and the spec). The about:blank URL itself is not intercepted and the controller is inheritted from the parent. All subresource requests for the worker (like script imports) would be intercepted.

I guess I didn't say it in the explainer, but w.importScripts() would trigger a load that happens from the worker's context. It would message the worker thread and start the load using the worker's context. So service worker interception would happen normally.

I don't really understand how

Better align worker and iframe behavior and infrastructure.

works given that the origin would still be the origin of whoever created the worker. Or is this mainly about the about:blank behavior and that you will have to inherit all the policies? (I guess it's nice we have policy container now, though I'm not sure it contains all the necessary policies yet.)

In my view there are two capabilities iframes have that workers lack that makes this situation worse.

  1. The ability to easily create a child that is empty, but inherits current origin, policy, service workers, etc.
  2. The ability to easily mutate a child to customize it programatiicaly.

Adding these two capabilities to workers is what I meant by aligning with iframes.

Edit: I think its because of these capabilities we rarely see people making blob URL iframes. Its just easier to make a blank one and mutate it. Lets make it that easy for workers too.

Are ServiceWorker reasonable to consider for an API like this?

I don't think we should allow service workers with about:blankjs script URLs. For one, we don't allow other local URL types like data and blob. Second, there would be no way to persist any customizations.

I also wouldn't add the external import capability to service workers since it would fail outside of the install phase.

@asutherland
Copy link

This seems like a nice way to pave the way towards reducing/eliminating URL.createObjectURL and to make it easier for tracking protection to block tracking worker globals without having to actually spawn the worker first!

@Kaiido
Copy link
Member

Kaiido commented Jul 30, 2021

I second #6911 (comment), I really don't see the point of having this get split in two new features.
What use case would be solved by importing multiple scripts from the owner?
Don't get me wrong, I see very well the point for being able to execute a Worker from a script hosted by a different origin, which is the main use case of this whole proposal, I just don't see why one would need to load multiple such scripts.
If the main script has dependencies, it will load it itself.

If there is no need to load multiple scripts, then why have a method at all?

The constructor option seems largely enough, and that would make everything so much easier to have just one new option in the constructor instead of (several?!) new methods with unclear behavior (it might be hard for developers to understand when are these scripts loaded and executed. Should the worker script get a new set of events to let know when the external import of a script succeeded or failed etc.)...


And regarding the will to reduce the use of createObjectURL, I suspect (though I have no numbers) that the biggest use of blob:URLs for Workers actually comes from authors who prefer to distribute a single script to their users and will store the Worker's script as a function in the main script and then stringify it into a Blob to start their Worker.
For this, I guess the equivalent of about:srcdoc would be more useful, (and it could also make the initial use-case of this proposal easier than what it is today), though that would conflict with some CSPs.

@wanderview
Copy link
Member Author

wanderview commented Jul 30, 2021

I don't understand how this is different from the proposed explicit methods, which seem to do just that - load cross-origin scripts into a same-origin worker.

To me avoiding confusion about the origin of the created context is important. In all cases I'[m aware of on the web platform the primary loaded URL determines the origin of the context. Using the proposed snippet above would confuse this situation IMO. It makes the cross origin URL look like the primary URL of the worker and therefore suggests it should get the other origin. Adding a cross-origin: true option doesn't really fix this confusion, but just makes it look more intentional.

The blank worker proposal purposely separates the creation of the worker from the loading the subresource in the worker in order to avoid this confusion. Yes, its an extra statement to call, but I think that's worth it to make the conceptual model clear.

Also, there is value in being able to load script in a worker multiple times. It gets us closer to supporting efficient coroutines. For example, you could imagine w.importScripts() or its equivalent being used with the js blocks proposal to do this.

@domenic
Copy link
Member

domenic commented Jul 30, 2021

If there is no need to load multiple scripts, then why have a method at all?

I think there's a big need to load multiple scripts: we want to encourage people to reuse workers (threadpool-style), instead of creating new ones for each script.

@RReverser
Copy link
Member

If there is no need to load multiple scripts, then why have a method at all?

I think there's a big need to load multiple scripts: we want to encourage people to reuse workers (threadpool-style), instead of creating new ones for each script.

In my experience, more often than not Workers expect to be self-contained - e.g. they will register onmessage handler without expecting any messages in a format different from theirs, so loading multiple such sources will fail spectacularly.

On the other hand, if loaded modules are intended to be used together and are aware of each other, then, as @Kaiido pointed out, this can be solved by creating a separate entry point that simply imports both of the required sources. It adds a request indirection, but IMO 1) that's fine for the less common case of multiple modules per Worker and 2) will be usually bundled in prod anyway, while 3) it would improve API for the more common usecase of loading a single module.

@wanderview
Copy link
Member Author

Well, I do expect people would need to write to the new feature in order to take advantage of it. And I don't think the new feature is adding a lot of new risk since I expect most worker scripts do not protect themselves from being imported into a different top level script. (I think you can prevent this with CSP, but I have not verified.)

@Kaiido
Copy link
Member

Kaiido commented Jul 31, 2021

there is value in being able to load script in a worker multiple times. It gets us closer to supporting efficient coroutines. For example, you could imagine w.importScripts() or its equivalent being used with the js blocks proposal to do this.

I think there's a big need to load multiple scripts: we want to encourage people to reuse workers (threadpool-style), instead of creating new ones for each script.

Could these motivations and any others be included in the proposal (be it as a link to previous discussions if any), so we can all see it from a common ground?

Currently the only use case exposed by the proposal reads

  • I want to host my Worker script on a cross-origin CDN.

This is a fairly common case indeed, and it would be great to solve this.
However to solve this only, no need for a Swiss-Army-Knife proposal. A simple option (whatever the name of the option) at construction will get it done, simple to spec, simple to use.

I don't really see how the last two goals got derived from this use case.

Now if this proposal must really encompass these three goals together, I'd like to understand better how this would work by means of examples. I guess that indeed we will need to develop an entirely new way of writing Worker scripts so that they can handle new part of scripts being inserted at any time.
I know there is a precedent with Worklets.addModule , but I must admit I'm not too used to it to see what it means exactly in terms of usage. The only times I personally used it to load several modules were because Firefox doesn't support import from inside the Worklet. So I just did await from outside that the first module is added before adding the next one... I never saw it being used for "injection during execution".

Failing to see the model(s) that could be used to handle this, I'm not sure how the threadpool idea would benefit from this proposal.
Certainly all the tasks to get executed by this threadpool would have to be defined before-hand, right? Even with the possibility to inject ModuleBlocks, I don't clearly see how that would help defining new methods for the inner threadpool (I guess we're not advocating for the injection of dynamic scripts through Blobs), or, even if we take these as one-off scripts, how their result would get passed to the main thread. In my eyes there would still be the need for a main engine, and that main engine should be aware of what it can do or not, importing whatever dependencies itself.
But once again, maybe it's just me who fails to envision all this, so I'd be very grateful for examples making it very clear what this proposal really solves and how.

@wanderview
Copy link
Member Author

Sure. I'll take an action item to update the explainer with more use cases. I am catching up after being out of office, though, so there might be a delay.

Overall, though, I think its around providing a mechanism that can be used kind of like GCD. In particular, this combines well with the js blocks proposal.

I'll also try to better explain some of my concerns with making the constructor script URL cross origin. There are browser architectural and security concerns beyond API shape confusion.

Finally, maybe we could compromise here and offer a static convenience method which combines the two steps in addition to the current proposal. Something like Worker.createBlankAndAddModule(url).

@surma
Copy link
Contributor

surma commented Aug 11, 2021

(I know I’m bikeshedding ahead of time here, but in my opinion, something like Worker.createBlankAndAddModule(url) is not really a significant DX improvement over new Worker({type: "module"}).addModule(url).)

@Kaiido
Copy link
Member

Kaiido commented Aug 11, 2021

(I know I’m bikeshedding ahead of time here, but in my opinion, something like Worker.createBlankAndAddModule(url) is not really a significant DX improvement over new Worker({type: "module"}).addModule(url).)

This is not really bikeshedding though. In one case you only define a new way of starting a new Worker from any script, while in the other case you are creating a whole new paradigm where scripts can be injected from outside without the inner script having control over it.

As I understand it, bikeshedding over Worker.createBlankAndAddModule(url) would be to propose something like

new Worker("about:blank", { // or null or whatever
  type: "module",
  initialScript: "https://cdn.foo.com/my-worker-script.com" // or any other property name that makes sense
})

By the way, I'd like to challenge a bit the idea that these Worker.importScripts() & al. don't bring any security issues.
At least I (and I know a few others to which I've been bringing this maybe wrong idea) had the assumption that whatever exists in the Worker's context can't be reached from outside, unless the Worker context sends it itself.
Given this assumption, it's entirely possible that some code would for instance store a secret in the global scope of the Worker and feel safe since the other contexts can only reach it through whatever they'll postMessage() to them.
Adding the ability to inject scripts from outside out of the blue would break this assumption, and would expose the secret to whatever script injected by the other context (they just have to postMessage the secret back to the other side).

An other scenario could imply a SharedWorker, where each client is assumed to be unable to see what the other clients are doing with this SharedWorker. Once again, this Worker.importScripts() & al. methods would allow one page to inject a script that would break this assumption, and for instance overwrite the MessagePort.postMessage method so that this script can catch all the calls made to the other clients and receive it itself.

Sure writing sensitive code based on this assumption was a bad call if anyone did so, but I think this needs at least some consideration.

@surma
Copy link
Contributor

surma commented Aug 13, 2021

I want to re-emphasize @domenic’s point about reusing workers.

The main motivation that has been discussed so far is that it is annoyingly complicated to create a worker with a source file that’s being served from a CDN. The most common workaround I see is people using data URLs or Blob URLs combined with importScripts:

function crossOriginWorker(crossOriginSrc) {
  return new Worker(
    URL.createObjectURL(
      new Blob([
        `importScripts("${src}");` // who needs sanitization
      ], {type: "text/javascript"})
    )
  );
}

However, another big problem is that Workers are hard to re-use. Instead, I see people creating a new worker instead, which is bad. Even if they are performance aware enough to terminate() the old worker, it’s still pretty wasteful. We should be encouraging developers to run multiple, independent modules in a worker, just like people a currently running literally all their modules on the main thread.

I think this proposal is quite elegant in that it solves both problems with a minimal addition to the Worker API, while the new API parts already have precedent on the platform (see Worklets) and don’t rely on changing anything potentially related to origins and security. This API addition would be a great boon for a scheduler-like primitive (even in user land code) where the individual worker is abstracted away and the scheduler re-uses workers from a pool to run scripts.


To give my 2c on some question:

I don’t think there’s cause for worry about worker code expecting to be in sole control of the Worker. Libraries that use workers seamlessly under the hood are not affected by this proposal (they are in control of creating the worker and no other code will get added). Libraries that expect to be importScripts()’d into a Worker should already be safe-guarded against receiving stray messages. Most of the time, in my experience, worker code is authored by the app developer (not a library developer) and messages are received, parsed and dispatched by custom code. This proposal is, in my opinion, beneficial for this setup.

@Kaiido I think any code that stored secrets on a global is already broken. Even without this proposal, you can’t know whether your code is in sole control of the worker global or whether it is just one of many scripts imported via importScripts().

@Kaiido
Copy link
Member

Kaiido commented Aug 13, 2021

Thanks for restating these points, but I don't see the answers to most of the questions in #6911 (comment).

I still don't see how this proposal would really help that threadpool idea.
If people today use multiple workers it's because they don't want to write the engine to handle all the various operations that their worker should be doing, and instead create one Worker per (often dynamic) operation. Unless we are advocating for injecting dynamically created scripts (blob:// URIs), I'm not sure this would really help them. They'd still need an engine at least to coordinate the various messages from the main context, and if they took the time to write the engine, then it's not hard to also make that engine import whatever module it needs.

But once again, it's also possible that I may just be missing completely how all this should work from the Worker script point of view, and thus I'm still hoping for clear examples of usage.


Regarding the point of breaking the assumption that Worker's contexts are isolated, when today you write new Worker("myscript.js") you can be pretty sure of what runs there. These methods bring the possibility for any external scripts running in the main context to break that confidence.
I still believe we should not just hand-wave this point that fast, but give thorough thinking about it.


For what it's worth, I built a simple demo to try to convince myself of the potential benefits (or risks I must admit) of this proposal, available at https://glitch.com/edit/#!/worker-external-importscripts and the result is that I am still not really convinced.

@Jamesernator
Copy link

Jamesernator commented Oct 27, 2021

In my experience, more often than not Workers expect to be self-contained - e.g. they will register onmessage handler without expecting any messages in a format different from theirs, so loading multiple such sources will fail spectacularly.

If people today use multiple workers it's because they don't want to write the engine to handle all the various operations that their worker should be doing, and instead create one Worker per (often dynamic) operation. Unless we are advocating for injecting dynamically created scripts (blob:// URIs), I'm not sure this would really help them. They'd still need an engine at least to coordinate the various messages from the main context, and if they took the time to write the engine, then it's not hard to also make that engine import whatever module it needs.

Yeah so I think this is the main problem with just adding importScripts/addModule, what would significantly help though is a way to send a message channel (or readable stream, etc) to a imported module/script.

One way could to be do the sort've thing comlink does with a wrapper thing around the imported module (I don't know what an equivalent for importScripts would be) i.e.:

const worker = new Worker();

const workerModule = await worker.addModule(module {
    export function bigSum(count) {
        let sum = 0;
        for (let i = 0; i < count; i++) {
            sum += i;
        }
        return sum;
    }
});

// Module exported functions are wrapped with a message channel
const sum = await workerModule.bigSum(1000);

Alternatively we could have a way of doing this more explictly by passing an explicit message channel somehow, this is a bit lower level but would allow something like the above to be built on top of this:

const worker = new Worker();

const channel = new MessageChannel()

await worker.addModule(
    "default", // Module member to call 
    channel, // Data to pass to module export
    module {
        export default function start(messageChannel) {
            messageChannel.addEventListener("message", ({ data }) => {
                // Process on our own dedicated channel
            });
        }
    },
);

An inbetween idea could be to give a remote handle kind design (this is heavily inspired by Puppeteer/Playwright's API for working with objects through the debugger channel) i.e.:

const worker = new Worker();

const moduleHandle = await worker.addModule(module {
    export function bigSum(count) {
        // ...
    }
    
    export const x = 3;
});

const bigSumHandle = await moduleHandle.get("bigSum");
const result = await bigSumHandle.apply([10000]);

I have no idea how any of these ideas would work with importScripts, but it could be the case there was a document.currentScript-analogue for establishing some channel (i.e. self.currentScriptData, worker.importScripts("./someScript.js", { channel: new MessageChannel() })). Alternatively it may be the case that supporting classic scripts is just less important if something like .addModule is available anyway.

@josephrocca
Copy link

josephrocca commented Oct 31, 2021

Strongly agree with @Jamesernator (I actually was going to quote the exact same two paragraphs from @Kaiido and @RReverser), and was going to mention that to improve web worker ergonomics and developer uptake of best practices, I think we really need to look at the popular libraries/frameworks that people are building around them (like comlink).

For me, the current friction around reusing workers for multiple modules comes almost entirely from setting up all the extra onmessage/postMessage boilerplate. Something like this (Jamesernator's first code block):

const worker = new Worker();
const workerModule = await worker.addModule(module { export function bigSum() { ... } });
const sum = await workerModule.bigSum(1000);

would be a wonderful improvement in the ergonomics of using workers, and would make it trivially easy get expensive operations off the main thread and reuse the same worker for multiple modules.

@Kaiido
Copy link
Member

Kaiido commented Dec 19, 2021

Yes I agree with #6911 (comment) and #6911 (comment), this would be very useful, it lets even dream of a ThreadPoolWorker where the UA could automagically manage the multi-threading itself.

Not so useful for the initial use-case of starting a Worker from a CDN hosted script though.

@lgarron
Copy link

lgarron commented Jan 30, 2022

I would really love to see this idea implemented in browsers.

There are certainly workarounds for a given website, but it's basically impossible to maintain a library that uses web workers without spending countless hours on testing against actual browsers and bundlers. We've had to file a lot of bugs and switch bundlers over this very issue. The proposal here introduces a situation where the semantics are clear, and where I would hope that a new function like executeScripts() or addModule() is clearly understood to be a reference to an entry file that needs to preserved (similar to a dynamic import).

I see one challenge here that would be addressed by an alternative like module blocks: passing in classes or modules at runtime. This is particular valuable for two reasons:

  • Packaging a library into a single file, instead of requiring a separate entry file for any worker.
  • Plugins or dynamic code that can be passed into a worker without having to program it into the worker ahead of time.

I would love if that could be addressed here as well, although unfortunately I don't see an ergonomic way that isn't basically equivalent to module blocks. (Although addModule as described in this issue would certainly allow any static entry file to execute and register itself with already-running worker code by communicating through globalThis.)

@lgarron
Copy link

lgarron commented Jun 2, 2023

With the release of Firefox 114 next Tuesday, all major browsers will support module workers. I'm personally really looking forward to this, as it allows me to remove a lot workarounds. However, I still have to manage a large stack of remaining workarounds so that our code can be used from npm and from a CDN directly.

Due to the need to use a new Blob(…) trampoline I've had to spend a large amount of time debugging bundlers, filing issues, and trying to advocate for bundlers not to break my carefully constructed workarounds1. It would really save a lot of pain to have any of the instantiation options in this proposal.

  1. If this is stuck due to the need to solve too many use cases, would it be possible to move forward with a minimum forwards-compatible solution that solves the CDN use case, such as this or this?
  • As others have pointed out, this would not be more powerful/unsafe than a more general API.
  • I don't think it matters to code authors whether the script URL is about:blank, about:blankjs, or something else, and I don't envision writing code that would break if a spec would change this in a future revision.
  1. Is there anything I can do to help move this forward at the moment?

Footnotes

  1. Basically 100 lines of workaround code across 4 files that are all just different ways to emulate new Worker(import.meta.resolve("./worker.js"), {"type": "module"}) across enough environments.

@josephrocca
Copy link

josephrocca commented Sep 8, 2023

I wonder who would be best to ping for a status update here? Maybe it's kinda blocked on some other, related proposal like module expressions and/or module declarations? Hesitantly pinging @nicolo-ribaudo in case you can provide any perspective/info here 🙏

Just kind of desperately hoping we can get to something like this eventually (discussed earlier in this thread):

const worker = new Worker();
const workerModule = await worker.addModule(module { export function bigSum() { ... } });
const sum = await workerModule.bigSum(1000);

@domenic
Copy link
Member

domenic commented Sep 8, 2023

This is actually just blocked on implementation/spec/tests work. If you're able to contribute, please do!

@annevk
Copy link
Member

annevk commented Sep 8, 2023

I do worry about the race we would be introducing for SharedWorker here. Depending on which document creates it you could end up with a wildly different container policy. For Worker the general idea seems okay, although we should evaluate if container policy contains all the relevant policies we need to copy over.

lgarron added a commit to lgarron/worker-execution-origin that referenced this issue Sep 8, 2023
@lgarron
Copy link

lgarron commented Sep 8, 2023

This is actually just blocked on implementation/spec/tests work. If you're able to contribute, please do!

Sounds like a deal. In case it gets things out the door: https://github.com/lgarron/worker-execution-origin

Or if it's preferable to implement the full blank Worker proposal, I can adapt the test code there to be more general.

@domenic
Copy link
Member

domenic commented Sep 9, 2023

Hmm. The alternate proposal is interesting. I'm unsure whether it meets the various constraints people had in mind here. It certainly is less powerful; it doesn't solve the use cases I'm mildly passionate about, around allowing multiple scripts to be run in the same worker. (As such, it wouldn't help address my concerns about module blocks + worker integration.)

But I'm unsure if we have anyone who would object, if you were to implement that alternative proposal in browsers yourself.

BTW, in case you weren't aware, we're looking for tests in the web platform tests format.

@asutherland
Copy link

A major advantage of the blank worker approach over the https://github.com/lgarron/worker-execution-origin proposal is that we sidestep many of the issues raised in #9571. I feel that it's significantly easier to reason about what's going on with the explicit semantics the explainer in the first comment currently has of any loads explicitly not being top-level loads:

These methods would act as if they sent a postMessage() to the worker which then invoked importScripts() or addModule() in the worker context. It would then postMessage() back to the owning context, indicating that the script execution was completed. This would then resolve the promise returned from w.executeScripts().

In general, I would also echo @domenic's desire to support multiple scripts in the same worker. We are addressing some technical debt in the Firefox Workers implementation and something that is majorly clear is that it is only possible to GC workers in the most excessively trivial cases. So it is desirable to encourage the threadpool idiom @domenic describes rather than favoring an idiom where many one-off workers are created because it sidesteps a variety of pathological resource leak scenarios.

Also, from an implementation perspective, I think for Firefox we are much more likely to be able to implement the blank worker proposal in a timely fashion. The worker execution origin proposal would be significantly more scary to implement because it would challenge several existing assumed invariants in particularly hairy code.

@annevk
Copy link
Member

annevk commented Sep 9, 2023

The way I read https://github.com/lgarron/worker-execution-origin#proposal-details the passed in URL is fetched as a subresource of the newly created worker, so I think it's fairly equivalent to the blank worker proposal. But you're right that it doesn't encourage reuse of the worker global.

Side note: In addition to no SharedWorker, I would suggest that we also don't offer classic workers in this fashion.

@lgarron
Copy link

lgarron commented Sep 11, 2023

The way I read https://github.com/lgarron/worker-execution-origin#proposal-details the passed in URL is fetched as a subresource of the newly created worker, so I think it's fairly equivalent to the blank worker proposal. But you're right that it doesn't encourage reuse of the worker global.

Indeed, my goal was to describe something that should be no more controversial or difficult to ship than blank Workers.
But if @asutherland has concerns, I don't have a horse in this race — I'm much more invested in the ecosystem improvements of getting anything across the finish line.

Side note: In addition to no SharedWorker, I would suggest that we also don't offer classic workers in this fashion.

For what it's worth, this works for me.

Modern code bases use relative URLs to refer to related files, and it's pretty much impossible to write portable Worker code without using new URL(…, import.meta.url) or import.meta.resolve(…) to do so. Since both of those are ESM-only, I definitely view classic workers as a legacy feature (but understand that some bundlers still wish to target them).

@lgarron
Copy link

lgarron commented Sep 11, 2023

BTW, in case you weren't aware, we're looking for tests in the web platform tests format.

Thanks! I've never written one before. Do you think https://github.com/web-platform-tests/wpt/blob/cd2e11b07bc04f02366ab93e5df41bf3cfc5cf95/resource-timing/cross-origin-iframe.html and https://github.com/web-platform-tests/wpt/blob/cd2e11b07bc04f02366ab93e5df41bf3cfc5cf95/xhr/open-url-worker-origin.htm would would be good tests to start from, or should I be starting somewhere more basic?

@domenic
Copy link
Member

domenic commented Sep 14, 2023

That one has a bit of extra stuff to handle things specific to that API, so here's a simpler sample: https://github.com/web-platform-tests/wpt/blob/26d5ce16a6f5e51429de12f25d3c011c4554ee32/html/browsers/history/the-history-interface/pushstate-replacestate-empty-string/pushstate-base.html . In general, https://web-platform-tests.org/writing-tests/testharness.html might be a good starting point.

@jakearchibald
Copy link
Contributor

ShadowRealm is now stage 3, and I think it has ideas we could borrow.

const worker = new Worker('about:blankjs', { type: 'module' });

// Import into worker:
await worker.importValue(specifier);

// Import and get export:
const value = await worker.importValue(specifier, exportName);

importValue would throw if the worker is not type: 'module'.

The export can be anything structured cloneable, but can be or can include functions.

When functions are called, the args are cloned and the function in the worker is called with those args. The return value is cloned, and used to resolve the function on the caller's side.

For example:

worker-utils.js

export createNumbersArray(length) {
  return Array.from({ length }, (_, i) => i);
}

index.js

const worker = new Worker('about:blankjs', { type: 'module' });
const createNumbersArray = await worker.importValue('./worker-utils.js', 'createNumbersArray');

const numbersArray = await createNumbersArray(3);
// [0, 1, 2]

@Jamesernator
Copy link

The export can be anything structured cloneable,

One caveat that should probably be considered is how to pass transferable objects (e.g. message ports, offscreen canvases) through such functions.

Two possible approaches:

Auto transfer

Certainly the most convenient, especially for arbitrary modules that aren't aware they're in a worker context.

This would look like:

export async function createRenderer(offscreenCanvas) {
     const ctx = offscreenCanvas.getContext("webgpu");
     
     // ...setup context etc
     
     const { port1, port2 } = new MessageChannel();
     
     port1.onmessage = ({ data }) => {
          // render the frame somehow
     }
     
     // Auto transfered
     return port2;
}
const createRenderer = await worker.importValue("./renderer.js", "createRenderer");

// Auto transfers the offscreen canvas
const framePort = await createRenderer(someCanvasElement.transferControlToOffscreen());

function renderFrame() {
     const data = getInfoFromDOMSomehow();
     
     framePort.postMessage(data);
}

Explicit

More like existing APIs, but means we need to provide some additional methods to actually provide the transfer list. However within the worker this is a bit of a footgun as we need to expose the transferList somehow.

Usage might look like something like:

export async function createRenderer(offscreenCanvas) {
     // ...same as previous example...
     
     // Explicit transfer with value and transferList, this might be a footgun though
     // for people expecting just plain returns of values to work
     return { value: port2, transferList: [port2] };
     
     // ALTERNATIVE:
     // Provide the transfer list on the `this` value
     this.transferList.push(port2);
     return port2;
}
const createRenderer = await worker.importValue("./renderer.js", "createRenderer");

const offscreenCanvas = someCanvasElement.transferControlToOffscreen();
const framePort = await createRenderer
     .callWithTransfer(offscreenCanvas, [offscreenCanvas]);

@domenic
Copy link
Member

domenic commented Jan 19, 2024

I think this idea of providing more ergonomic transfer/clone operations for workers is separate from the blank worker proposal. So let's move it to another thread, if people want to continue discussing.

@jakearchibald
Copy link
Contributor

Moved to #10078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: workers
Development

No branches or pull requests