Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with import-attributes #5640

Closed
xtuc opened this issue Jun 12, 2020 · 70 comments
Closed

Integration with import-attributes #5640

xtuc opened this issue Jun 12, 2020 · 70 comments
Labels
integration Better coordination across standards needed topic: html modules The HTML modules proposal topic: script

Comments

@xtuc
Copy link

xtuc commented Jun 12, 2020

The proposal https://github.com/tc39/proposal-import-attributes, currently stage 2, provides a way for developers to add attributes to the in the import, for instance:

import json from "/foo.json" with type: "json";

The host will ensure that the resource is of type JSON.

In tc39/proposal-import-attributes#61 we are discussing if the host would send a header indicating the expected type to the server, I suppose that would be Accept. We decided that the type attribute isn't part of the module cache key.

On the other hand, tc39/proposal-import-attributes#16 shows that multiple representation of a resource may exist; BinAST and JavaScript for example.

Does sending a header based on the module type or other semantics make sense to you?

@littledan
Copy link
Contributor

Let's also track the HTML integration PR for module attributes in this issue.

A big issue with sending an Accept header based on the type would be, what should be done if a module is already in the module map, but then it's imported with a different type? Should there be a second fetch in that case? What do we do if the response is different? This issue makes me suspect that we shouldn't send different HTTP headers due to the type attribute.

@domenic
Copy link
Member

domenic commented Jun 12, 2020

I think the type should be part of the module map cache key to solve those issues.

@littledan
Copy link
Contributor

littledan commented Jun 12, 2020

@domenic Could you say why? If it's important to send different HTTP headers, which should we send? (Note that the previous JSON module patch didn't send different HTTP headers AFAIK, and the module type was not part of the cache key.)

@littledan
Copy link
Contributor

To be concrete, the conflicting intuition from JavaScript developers about making the type part of the cache key would be that a module specifier points to a unique module; that it's not "cloned" just because different attributes are used. While this rules out a lot of potential use cases for module attributes, I thought it would be consistent with the requirements for type on the Web.

The current proposal draft contains a MUST-style requirement that modules not be cloned, but TC39 understands that hosts might willfully violate this requirement (as HTML has willfully violated the JS spec in the past), and is open to reconsidering this requirement in the future. See this section of the import attributes README for details.

@jakearchibald
Copy link
Contributor

fwiw, we don't make credentials mode part of the cache key, even though it changes the shape of the request.

@domenic
Copy link
Member

domenic commented Jun 15, 2020

Different import statements should cause different cache keys, in my thinking. Otherwise we would just use an import specifier microsyntax.

@jakearchibald
Copy link
Contributor

I do like the idea of being able to import the same module twice with different types.

@littledan
Copy link
Contributor

littledan commented Jun 15, 2020

fwiw, we don't make credentials mode part of the cache key, even though it changes the shape of the request.

OK, then maybe it's OK to "race" here and let the first thing that loads the module specifier determine the shape of the request. In this case, how should the request to the server be modified based on the module type?

Different import statements should cause different cache keys, in my thinking. Otherwise we would just use an import specifier microsyntax.

I don't understand this assertion. I'm still not a fan of import specifier microsyntax, for reasons discussed elsewhere (e.g., tc39/proposal-import-attributes#11).

I do like the idea of being able to import the same module twice with different types.

If we want module attributes that can transform the interpretation of a module, while others checked things about the module (like the MIME type), what if we supported two different kinds of attributes? It could be like this:

import { foo } from "specifer" if { checkKey: "value" } with { transformKey: "value2" };

Since some module attributes are checks and some are transforms, making them syntactically separated would let developers understand which parts form part of the cache key/extended specifier and which parts don't.

Based on previous conversations, I thought type was a check. I still don't understand why this isn't the case.

cc @syg

@domenic
Copy link
Member

domenic commented Jun 15, 2020

type is definitely a transform. It changes the interpretation of the bytes received.

@littledan
Copy link
Contributor

I also wanted to mention: about @jakearchibald 's suggestion to allow you to import a module as "text" and have that not be a check but rather a change in interpretation (presumably this would only be permitted for same-origin specifiers): My suggestion was that we use a separate attribute for these changes of interpretation, using type to check the MIME type. See past discussion: tc39/proposal-import-attributes#43 . I haven't yet heard an explanation of the downside of this approach, where type checks the MIME type (which is what we need to avoid privilege escalation), and other attributes could change the interpretation of a module.

@littledan
Copy link
Contributor

Concretely: if type is a transform, what do you want it to do on the Web? I've only heard about semi-concrete proposals for module interpretations for things like WebAssembly, JSON, CSS, HTML, and MessageFormat modules, each of which would have its own MIME type, so one can't be interpreted as the other. I'd be interested to learn more about the kinds of transforms you might want to do based on the type. Is this all about the HTTP headers, or are there more changes you have in mind?

@xtuc

This comment has been minimized.

@littledan
Copy link
Contributor

littledan commented Jun 15, 2020

What's the problem with using a separate attribute key for this case of reinterpretation as opposed to MIME type checking? We several requests from JavaScript developers who were worried about developer confusion coming from a single module specifier leading to multiple module identities, so I wanted to at least give predictable behavior to type in terms of not doing that. The with/if split would make it even more visible in general.

@domenic
Copy link
Member

domenic commented Jun 15, 2020

I don't think adding more complexity is the right answer here. Asking developers to figure and memorize the mapping between module type and with vs. if is unlikely to work. Especially given that the alternative is using build tools (which require neither) or HTML using a single microsyntax (which would be simpler).

@domenic
Copy link
Member

domenic commented Jun 15, 2020

type checks the MIME type (which is what we need to avoid privilege escalation), and other attributes could change the interpretation of a module.

Type does not just check the MIME type. It changes the interpretation. If you interpret { "foo": "bar" } as a JS file, then there is no exports. If you interpret it as a JSON file, then it has a default export. It's not just a check.

@littledan
Copy link
Contributor

It's not just a check.

To clarify, the semantics that we were proposing was that modules would remain strict about their MIME type: type: "json" could only be used with a JSON MIME type, and no explicit type could only be used with a JavaScript MIME type. So, it would be a check that the MIME type was as expected, followed by interpretation of the module driven by the MIME type.

@annevk
Copy link
Member

annevk commented Jun 16, 2020

Like Daniel, I also thought type was a fatal assert. Overloading it to support other non-executing use cases (e.g., get the response as Blob) seems reasonable though.

@xtuc
Copy link
Author

xtuc commented Jun 16, 2020

Sorry, I have hidden my previous comment to avoid confusion.
To clarify, the type attribute is a fatal assert, as @littledan mentioned it's a strict match against the MIME type.
We are not proposing transformation or reinterpretation of modules in this proposal.

@annevk
Copy link
Member

annevk commented Jun 24, 2020

@domenic could you expand on your point of view on how it's not a fatal assert? I've read through this issue and the PR and I'm still not sure why you are proposing type as an additional key.

@domenic
Copy link
Member

domenic commented Jun 24, 2020

Sure. There are several considerations here. They all revolve around the fact that the module map is crucial to the fetching infrastructure.

  1. Observable behavior:

    • An import of a JS module deep in the tree with if { type: "json" } must not poison the well so that future fetches with if { type: "javascript" } fail.
    • Two imports being fetched at the same time, one with if { type: "json" } and another with if { type: "javascript" }, must not cause each other to block: they must both fetch in parallel, with one failing (and that failure getting cached under the appropriate cache key, so that future fetches with that same if also fail) and the other succeeding (and that success getting cached under the appropriate cache key, so that future fetches with the same if also suceed.) That is, imports with different if values must not interfere with each other.
    • Two imports at the same time, with the same if clause, must be deduplicated (as they are today).
  2. Specification infrastructure: We have a perfectly good way of accomplishing the above. It is by using the module map to track ongoing fetches and failures. Expanding it to support import attributes requires only a very tiny modification to the existing HTML spec (and implementation) infrastructure: changing the module map key to contain both the URL and import type.

    You may be able to reproduce the above observable behavior via contortions, e.g. at various points there have been draft PRs which remove certain failures from the module map, or use a parallel caching infrastructure besides the module map, or otherwise intervene with the normal functioning of the module map in order to artificially avoid the simple architecture.

    It's not clear whether those contortions have successfully reproduced the above behavior. And it's quite likely that they create observable behavior differences in other edge cases beyond the few above that I've been able to come up with in 5 minutes of thought. But it just doesn't matter. That complexity is not acceptable when there is a simple alternative available.

@annevk
Copy link
Member

annevk commented Jun 24, 2020

Thanks. That addresses the "what", but it's still not clear to me "why". Why would the same specifier be used for a different type of resource? Are you anticipating/envisioning client-side-driven content negotiation? Are there proposals to change the Accept header accordingly? Why would this not apply to a changing referrer? How is that different?

@domenic
Copy link
Member

domenic commented Jun 24, 2020

It's not really sensible for authors to use different types on the same specifier, but the important thing is that if they do, they should get sensible behavior (and not, e.g., enable poisoning of the module graph for other parts of the app).

@annevk
Copy link
Member

annevk commented Jun 24, 2020

How can they use different types? Why are they more likely to poison the module graph this way than through referrer poisoning?

@domenic
Copy link
Member

domenic commented Jun 24, 2020

For example, if app.mjs includes library-1.mjs which does

import "https://example.com/library-2.mjs" if { type: "json" };

and we cached the resulting failure (like we currently cache all failures), that would poison the module graph so that app.mjs could not do

import "https://example.com/library-2.mjs" if { type: "javascript" };

There are multiple ways to fix this, but by far the most natural way is to make the type part of the cache key, so that we cache a failure for (https://example.com/library-2.mjs, json) but not for (https://example.com/library-2.mjs, javascript).

(This illustrates just one of my three "observable behavior" sub-bullets, FWIW; similar examples can be constructed for the other two, plus probably other scenarios.)

@frank-dspeed

This comment has been minimized.

@jkrems

This comment has been minimized.

@frank-dspeed

This comment has been minimized.

@frank-dspeed

This comment has been minimized.

@jkrems

This comment has been minimized.

@syg

This comment has been minimized.

@frank-dspeed

This comment has been minimized.

@littledan
Copy link
Contributor

littledan commented Sep 28, 2020

(Apologies for the length of this post.)

In the September 2020 TC39 meeting, TC39 reached consensus on advancing Import Assertions to Stage 3 on the basis of permitting HTML to do what it does in #5883: use the assertion as part of the cache key, but not reinterpret the module on the basis of this assertion. I think it's great that TC39 expressed flexibility and technical deference to work together here. TC39 also left up to hosts the ability to decide how to treat assertions that it doesn't know how to handle (with many in the community expressing a preference for ignoring them, so the assertions "pass", but no requirement made on hosts).

The HTML PR at #5883 takes a middle path: It treates the type: assertion as part of the cache key, and it rejects unrecognized assertions. This is not the worst combination for forward evolution: it permits us to take a different policy for other assertions, permitting them to be normalized or treating them as not part of the cache key entirely. I want to argue here that many other import assertion we could add wouldn't be part of the cache key, and in general, it would be more sensible to omit import assertions from the module map key.

When thinking about this problem, we previously got a bit caught up on the details of SRI (which wouldn't be the best application, as it would be better to transmit out-of-band). However, other predicates
over modules would also be possible, for example, checking that the referenced module subgraph:

  • Does not contain top-level await (as was broadly requested when top-level await was proposed, but arguably is inappropriate)
  • Does not import a particular module, or module type
  • Sticks within a particular language subset (e.g., "no side effects", but maybe there's no well-defined useful subsets?)
  • Does not import any code outside of its origin (however, maybe this is best left to CSP?)

In general, I think any assertion at all would want to have the property that, if you independently import the module without the assertion, you're still talking about the same module (maybe module "server shenanigans"). For the case of type: "javascript", @domenic suggested above that it would be possible to normalize the lack of an assertion to an explicit type: "javascript", so that it's referring to the same thing. Given that a missing type: assertion for a non-JS module is an error, we're completely covered, and there's no chance of unintentionally duplicating the module.

However, it's not clear to me how to extend this strategy to any of the above assertions. One way would be for "normalization" to remove them. Is this the intended path?

If unrecognized assertions were permitted and included as part of the module map key "unnormalized", then the strategy of "normalizing" certain recognized import assertions would lead to scenarios with two different module copies in older browsers, and the same module in newer browsers. This behavior doesn't match the transparent upgrade that ignoring unrecognized attributes is intended to provide. So it's good that this PR, at least, does ban unrecognized assertion keys and types, to preserve the ability to exclude future assertion keys from the module map key.

I'd prefer to address this issue by having HTML not treat import assertions as part of the cache key, as in an earlier version of the integration PR. I still don't understand the technical issue with these semantics. It seems to me like a logical error to treat assertions as part of the cache key, as they are just checking something about the module, not logically referring to a different one, and the above examples make this mismatch concrete. This change could enable us to ignore, rather than reject, unrecognized assertions.

@annevk
Copy link
Member

annevk commented Sep 29, 2020

I think @domenic's concern would still apply. With not treating asserts as part of the cache key you end up with race conditions if you or your large software project is not consistent with assertions.

I'm not entirely sure I understand the counter argument. Why wouldn't a browser either completely ignore or always fail on unsupported assertions? That seems more logical to me than trying to do something with them.

@littledan
Copy link
Contributor

@annevk I'm having trouble understanding the race condition issue. I was imagining that each time the module is imported, the assertion would be checked (possibly with evaluation of this assertion being cached inside the module itself, if we add non-trivial assertions). I think @dandclark addressed all the race conditions in the specification mechanics. If you could give an example of a scenario where such a race condition would occur, then it would help me understand better.

I'm not entirely sure I understand the counter argument. Why wouldn't a browser either completely ignore or always fail on unsupported assertions? That seems more logical to me than trying to do something with them.

My suggestion is that we could adopt the behavior of completely ignoring the unsupported assertions only if they're excluded from the module map key. If they're included in the module map key, then "ignoring" them would present forward-compatibility issues if they're later normalized, so in this case, it's best to do as the current PR does and always fail on them. (I'm not sure what you mean by doing something with them; maybe my writing above was unclear.)

@annevk
Copy link
Member

annevk commented Sep 29, 2020

I guess that model still confuses me. I would expect that once something is in the module map there would not be a way to get something else out of the same entry. But yeah, you're right that it could work without race conditions.


I was thinking that either you ignore it completely, or you support, normalize, and use it as part of the key.

@littledan
Copy link
Contributor

I guess that model still confuses me. I would expect that once something is in the module map there would not be a way to get something else out of the same entry. But yeah, you're right that it could work without race conditions.

The only things you'd get out would be the single module or an error. Would this be confusing?

I was thinking that either you ignore it completely, or you support, normalize, and use it as part of the key.

I'd be in support of this change. The current PR instead causes an error for unknown modules (partly since I was under the impression that, with @domenic 's requirements, we'd have to treat all assertions as part of the key). We could change this to ignore and permit unknown assertions.

@dandclark
Copy link
Contributor

dandclark commented Sep 29, 2020

I was thinking that either you ignore it completely, or you support, normalize, and use it as part of the key.

Is it not still an option to reject unknown assertions (by failing the import) without committing to whether or not potential future assertions should be part of the module cache key? In #5883 (and #5658), unknown assertions are currently rejected at the very beginning of fetch a single module script, before anything is fetched or added to the module map. So there are no side effects of a rejection at this point other than failing the module graph, and this behavior does not box us in to any decisions about whether future potential assertions are or are not part of the module map cache key. And since the rejection occurs before fetching/caching, I don't see what race condition could occur.

I'm not strictly against ignoring unknown assertions, but rejecting them seems more future proof and safer. If some new security-focused assertion is introduced, I would expect developers to prefer such assertions to fail on browsers where they are not yet supported, rather than be ignored silently.

@ljharb
Copy link
Contributor

ljharb commented Sep 29, 2020

@dandclark failing on unknown assertions makes it impossible to ship code that can work in both pre- and post- environments (ie, an env that doesn't yet understand a new assertion, plus an env that does). The most future-proof and safe thing is absolutely to ignore them, otherwise the adoption penalty for new assertions is too high.

@dandclark
Copy link
Contributor

Thanks @ljharb, that makes sense.

I changed the PRs for import assertions + JSON modules and import assertions alone to completely ignore unknown assertions. Unrecognized values of the type assertion still cause failure, and type still used in the module map cache key.

@littledan
Copy link
Contributor

@dandclark Thanks for these updates. With them, #5883 looks good to me. I don't think we should land #5658 because it overlaps with the TC39 JSON modules effort.

@dandclark
Copy link
Contributor

Hmm, currently #5883 (and #5658) checks for unknown values of the type assertion at the beginning of fetch a single module script, returning null and bailing out if there is an unrecognized type value. I think it might be more appropriate to instead perform this check in create a module script, at the same point that resolve a module specifier is called on all of the [[RequestedModules]] to check for failure. An assertion with an unsupported module type would raise a TypeError, just like an invalid URL specifier. Any downsides to this that I'm missing?

This is getting into the weeds a bit, not sure if this discussion would be better off in #5883.

@littledan
Copy link
Contributor

@dandclark I'm inclined to just agree to whatever's simpler/more consistent, but out of curiosity: Can you give an example of an observable difference from this change? Is it about which error class is thrown when the module graph fails to load, or does this affect timing?

@dandclark
Copy link
Contributor

In terms of complexity I think both options are pretty similar.

One observable change would be whether or not window.onerror is fired. If we raise a parse error like we do for invalid specifiers, window.error will be fired. If we instead just bail out of fetch a single module script with null, there will be no window.onerror, which is similar to what happens if the fetch 404's. See https://jsfiddle.net/dandclark_msft/hw0pkzqu/ where an invalid module specifier and a "real" parse error trigger window.onerror but the 404 does not.

I think another observable difference would be whether, for the following script foo.js, a.js is fetched:

import "./a.js";
import "./b.js" assert { type: "notARealType"};

If notARealType is checked when create a module script runs for foo.js, then a.js will not be fetched. If we wait to check until the beginning of fetch a single module script for the second import statement, then the fetch for a.js would already have been kicked off by the time the check happens.

In both cases I prefer the behavior of checking in create a module script. Behaving similarly to an invalid module specifier seems conceptually right, and failing earlier rather than waiting until the descendants are fetched also seems better.

@dandclark
Copy link
Contributor

Actually if the check is removed from fetch a single module script then it would need to be added to fetch an import module script graph in addition to create a module script, just like the resolve a module specifier calls are duplicated in both places. So it's marginally more complex...but making the behavior of an invalid assertion match that for an invalid module specifier stills seems correct to me.

@dandclark
Copy link
Contributor

I kind of talked myself into it, so I made the change (c225d56). I can back it out if I'm alone in thinking this is the right approach.

domenic pushed a commit that referenced this issue Jul 14, 2021
This change extends the JavaScript module system integration to include CSS module scripts, in addition to JavaScript module scripts. These allow web developers to load CSS into a component definition in a manner that interacts seamlessly with other module script types.

Explainer document: https://github.com/w3c/webcomponents/blob/gh-pages/proposals/css-modules-v1-explainer.md

Closes WICG/webcomponents#759. Part of #4321.

This change includes the integration for the import assertions proposal (https://github.com/tc39/proposal-import-assertions/), thus closing #5640 and closing #5883 by superseding it.
@domenic domenic closed this as completed Jul 29, 2021
mfreed7 pushed a commit to mfreed7/html that referenced this issue Jun 3, 2022
This change extends the JavaScript module system integration to include CSS module scripts, in addition to JavaScript module scripts. These allow web developers to load CSS into a component definition in a manner that interacts seamlessly with other module script types.

Explainer document: https://github.com/w3c/webcomponents/blob/gh-pages/proposals/css-modules-v1-explainer.md

Closes WICG/webcomponents#759. Part of whatwg#4321.

This change includes the integration for the import assertions proposal (https://github.com/tc39/proposal-import-assertions/), thus closing whatwg#5640 and closing whatwg#5883 by superseding it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration Better coordination across standards needed topic: html modules The HTML modules proposal topic: script
Development

No branches or pull requests