-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transliterator DatagenProvider #3877
Transliterator DatagenProvider #3877
Conversation
commit 7f84837 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 19:31:59 2023 -0500 Update request.rs commit 905e697 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:50:55 2023 -0700 gn-gen commit 1e52541 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:50:06 2023 -0700 Update sizes (probably need to fix nightly sizes too) commit 45778ce Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:47:06 2023 -0700 Use the TinyStr variant commit 04ee1df Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:42:26 2023 -0700 Remove Default impl commit 3d83b95 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:42:02 2023 -0700 Add impls based on Deref commit 3b14fbd Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 16:10:47 2023 -0700 Add 3-way enum (not using Stack variant yet) commit cda1498 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 13:57:26 2023 -0700 Fix build post merge commit 71c1087 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 13:56:30 2023 -0700 Support multipart auxiliary keys commit f31643a Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 13:17:17 2023 -0700 Use `+` for aux key commit dbefbf9 Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 13:04:55 2023 -0700 More more is_und commit f8e310e Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 13:03:07 2023 -0700 is_und in driver.rs (post merge cleanup) commit 33fad72 Merge: 5551d6b ea1ba9f Author: Shane F. Carr <shane@unicode.org> Date: Thu Aug 17 12:58:16 2023 -0700 Merge branch 'main' into auxkey Conflicts: provider/datagen/src/lib.rs commit 5551d6b Author: Shane F. Carr <shane@unicode.org> Date: Wed Aug 16 17:07:24 2023 -0700 fmt and refactor commit cd0d86c Author: Shane F. Carr <shane@unicode.org> Date: Wed Aug 16 17:00:56 2023 -0700 Fix baked_exporter.rs commit 241b563 Author: Shane F. Carr <shane@unicode.org> Date: Wed Aug 16 14:51:39 2023 -0700 Pull the separator character into a function where possible. commit 93f0969 Author: Shane F. Carr <shane@unicode.org> Date: Wed Aug 16 14:47:19 2023 -0700 Handle $ in baked_exporter commit 3493b75 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 19:24:09 2023 -0700 Use is_und instead of is_empty more consistently in fallback iterator commit 52ab723 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 19:19:23 2023 -0700 Add DataLocale::is_und; gen hello world testdata commit f36b405 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 19:05:19 2023 -0700 Use auxiliary keys in HelloWorldProvider commit 54101ad Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 18:57:06 2023 -0700 Forbid the empty string in AuxiliaryKey commit d0e30e2 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 18:54:44 2023 -0700 Docs, tests, cleanup commit 06e453c Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 18:29:18 2023 -0700 Change custom error type to DataError::KeyLocaleSyntax commit 6cb45f6 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 18:18:00 2023 -0700 Make comparison operations work commit 59aa0b4 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 15:55:52 2023 -0700 Start writing impl FromStr for DataLocale commit 3d36855 Author: Shane F. Carr <shane@unicode.org> Date: Tue Aug 15 13:23:16 2023 -0700 Initial auxiliary key APIs
🎉 All dependencies have been resolved ! |
This fails CI because the sample transform rule directory ( |
In https://github.com/unicode-org/icu4x/blob/main/tools/testdata-scripts/src/bin/download-repo-sources.rs#L121 you need to move |
"README.md" | ||
] | ||
|
||
[dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
#[derive( | ||
Debug, | ||
Clone, | ||
zerofrom::ZeroFrom, // QUESTION: does ZeroFrom need to be behind a feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required by DataPayload::with_mut
and DataPayload::try_map_project
, which is used by .as_deserializing()
only, so it could be behind the serde
feature. However, it's a dependency of icu_provider
, so making it optional doesn't gain much, and we don't have it behind a feature anywhere.
#[doc(hidden)] | ||
pub fn legacy_id_to_internal_id(source: &str, target: &str, variant: Option<&str>) -> String { | ||
// TODO(#3891): Decide representation for unknown BCP47 IDs | ||
let mut id = format!("x-{source}-{target}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: good solution for now
let dir_suffix = self.0.dir_suffix()?; | ||
let path = format!("{}-{dir_suffix}/main/{transform}/metadata.json", self.1); | ||
if self.0.serde_cache.file_exists(&path)? { | ||
self.0.serde_cache.read_and_parse_json(&path) | ||
} else { | ||
Err(DataErrorKind::Io(std::io::ErrorKind::NotFound) | ||
.into_error() | ||
.with_display_context(&path)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to construct a CldrDirLang
on the fly to call its read_and_parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant CldrDirNoLang
, because there is no language identifier around. Doing this now:
let cldr_dir = CldrDirNoLang(self.0, format!("{}-{dir_suffix}/main/{transform}", self.1));
cldr_dir.read_and_parse("metadata.json")
let dir_suffix = self.0.dir_suffix()?; | ||
let path = format!("{}-{dir_suffix}/main/{transform}/source.txt", self.1); | ||
if self.0.serde_cache.file_exists(&path)? { | ||
let buf = self.0.serde_cache.root.read_to_buf(&path)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change read_to_buf
by read_to_string
, then the AbstractFs
can handle UTF-8 errors. JSON and TOML deserialization can be done on strings as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added read_to_string
to AbstractFs
fn lookup_dir_from_internal_id( | ||
&self, | ||
internal_id: &str, | ||
) -> Result<Option<(String, bool)>, DataError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicer to return icu_transliterator_parser::Direction
|
||
fn find_bcp47_in_list(list: &[String]) -> Option<String> { | ||
for item in list { | ||
if item.contains("-t-") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to parse to Locale
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing aliases as an enum over Locale (for -t-) and String (for legacy IDs) now
] | ||
}, | ||
{ | ||
"cp_inv_list": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh these are quite big 😬
57e1af3
to
fb3a112
Compare
Depends on #3872
Part of #3736
This PR