Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globaldata: bikeshed feature and constructor names #3536

Closed
sffc opened this issue Jun 14, 2023 · 4 comments
Closed

Globaldata: bikeshed feature and constructor names #3536

sffc opened this issue Jun 14, 2023 · 4 comments
Labels
discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band

Comments

@sffc
Copy link
Member

sffc commented Jun 14, 2023

What should the feature be called that enables globaldata?

What name should we use for the constructors?

We had discussed in #2945 (comment) that the constructor names should clearly indicate the provenance of the data, but @robertbastian raised some concerns in #2945 (comment) that we should discuss further.

Discuss with:

Optional:

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Jun 14, 2023
@robertbastian
Copy link
Member

My order of preference for feature names:

  1. compile_data: I think this is most accurate, it compiles the data into the binary. I doesn't imply where the data comes from, which is correct as the feature also allows to compile client-provided data into the binary. In documentation I'd like to call the provider-less constructor the "compile-time-data constructor", and the unstable, buffer, and any constructors the "runtime-data constructors", which would align nicely with this name.
  2. compiled_data: Sounds a bit more like we already compiled that data in, but the user can actually do this themselves.
  3. data: Most concise. I don't think this is ambiguous on the component crates; on infrastructure crates like icu_provider it could be, but on a component it can only really mean one thing: enable the data.
  4. baked_data: We've been using this term already (although not consistently, the datagen mode is called --format=mod, the testdata provider unstable). I think "baked" is a good description for the data format, but less fitting for the constructors and the concept of compiling in data.
  5. auto_data: I don't think the data being automatically included is the best way to describe this. Clients can still manually set data at compile time, and it's not doing anything fancy like static analysis automatically.
  6. builtin_data: not accurate, as it's possible to use user-generated baked data via an env variable, so no data built into the crate will be used.
  7. globaldata: overloaded with the visibility descriptor in programming, and the data isn't necessarily global, as the user can supply their own (and CLDR doesn't have 100% coverage anyway).

@robertbastian
Copy link
Member

We didn't record a conclusion on whether to make this a default feature in #2945 (comment), but iirc there wasn't any opposition to this proposal.

@sffc
Copy link
Member Author

sffc commented Jun 28, 2023

  • @sffc - (reiterates points made in previous meeting)
  • @robertbastian - People using icu don't care to think about data.
  • @Manishearth - In ICU4C, there's a sensible default for data and it works. Globaldata will be a sensible default of data that will work. The new path sounds very i18n friendly to me overall. So it seems good to push people down that path.
  • @robertbastian - Do we all agree that this should be a default feature?
  • @Manishearth - There are some downsides of the default feature. One is that if people do not want to use it, they need to use --no-default-features which is not great (it causes stability issues, etc). The other downside is compile times.
  • @sffc - The feature isn't enabling third-party things. It's just turning on more pieces of ICU4X. Most users who want to turn it off are probably already in a situation where no-default-features is not difficult for them.
  • @zbraniecki - The alternative is that people need to enable the feature in their Cargo.toml.
  • @robertbastian - You don't need to disable the feature; you can just not use the constructors. If you vendor things, you need to download more things. For the nicer error message, we should do that here but also for all other functions that are hidden behind features.
  • @sffc - I think we should start with constructor names; I think everything else follows. I think it's important to highlight the data provenance. It is a definite improvement from our current experience of needing to read a tutorial about how to build your data. I think we should retain the educational aspect.
  • @Manishearth - I think data provenance is important; but I'm not sure we want to force it on our users / teach our users.
  • @robertbastian - This is not just a novice feature; it is also a power user feature. The others are for like runtime loading of data.
  • @eggrobin - If we say it's good enough for production, it's not a bad default.
  • @zbraniecki - I think we said in 1.0 that we wanted to eventually move toward try_new. What else would we want to be the default constructor? If we think in 2 years that it will probably be globaldata... how does it look? I think the mess of constructors is the biggest source of friction.
  • @robertbastian - It would be really nice if we had a simple try_new amongst the mess of constructors.
  • @sffc - Maybe we should go through the exercise of the bikeshed.
  • @robertbastian - Let's bikeshed the feature name.
  • @eggrobin - embedded_data is what Bazel calls it
  • @sffc - ICU has an API called common_data. Or maybe shared_data.
  • @skius - The downside of embedded is that we're compiling something, not embedding something that already exists
  • @Manishearth - It sounds a little like "embedded programming", the data for Risc-V or something
  • @sffc - Okay, maybe we can agree on this:
  1. Feature compiled_data
  2. Plain constructor names
  3. The constructor docs should say "with compiled data" in the first sentence, which should link to the constructor docs. Still keep the books emoji at the bottom.
  4. The feature is enabled by default and can be enabled/disabled with plumbing through the metacrate

LGTM: @robertbastian @zbraniecki @skius @eggrobin @Manishearth (@sffc)

@robertbastian
Copy link
Member

Outstanding tasks:

  • Verify the whole API surface, specifically:
    • Are all constructors available in all four variants (exception: properties only uses compiled and unstable)
    • Do all try_new constructors
      • mention "with compiled data" in the first sentence
      • mention that the compiled_data feature needs to be enabled
      • link to the constructor docs.
  • Update constructor docs

@robertbastian robertbastian removed the discuss Discuss at a future ICU4X-SC meeting label Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band
Projects
None yet
Development

No branches or pull requests

2 participants