Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transliterator DatagenProvider #3877

Merged
merged 43 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
8b12ce1
add sample transliterator data (de-ASCII)
skius Aug 16, 2023
b327e5e
DatagenProvider for Transform Rules
skius Aug 16, 2023
7d74393
add todos
skius Aug 16, 2023
68bd038
Merge branch 'main' into datagen-transform-rules
skius Aug 18, 2023
73e86a8
Squashed commit of the following:
skius Aug 18, 2023
6edbd3c
hook up the parser to datagen
skius Aug 18, 2023
1c0de7c
add zerovec:databake dep
skius Aug 18, 2023
9f267d4
clippy
skius Aug 18, 2023
ad73a31
use collections databake feature
skius Aug 18, 2023
0158d6b
add more sample transforms
skius Aug 18, 2023
521e52c
add feature for cargo make testdata
skius Aug 18, 2023
432a2be
(wip) broken human-readable datagen due to InterIndic-Arabic's CPIL u…
skius Aug 18, 2023
d4b3bb2
make data serialize to json
skius Aug 18, 2023
7590a32
use available transliterator mapping in compilation
skius Aug 18, 2023
69f8005
regenerate testdata
skius Aug 18, 2023
3de8cd5
use visibility information
skius Aug 18, 2023
181ced6
regenerate testdata
skius Aug 18, 2023
8434e85
remove todo
skius Aug 18, 2023
9cc37ec
add todo
skius Aug 18, 2023
83696c6
add issue number to crate feature todo
skius Aug 18, 2023
1ee858f
add Greek-Latin/BGN
skius Aug 18, 2023
84f9bef
Merge branch 'main' into datagen-transform-rules
skius Aug 19, 2023
f78635b
fix tests and clippy
skius Aug 19, 2023
ac6e142
add correct cfg for transform-specific features
skius Aug 19, 2023
5804081
add direct dependencies to datastruct
skius Aug 19, 2023
f596a56
regenerate testdata
skius Aug 19, 2023
2d06388
add features to icu_transliteration
skius Aug 19, 2023
98b605c
add transliteration/compileddata?
skius Aug 19, 2023
c237a24
make compileddata compile
skius Aug 19, 2023
adf5907
compiled data for transliteration
skius Aug 19, 2023
09432d9
remove todo
skius Aug 19, 2023
5d517fe
download-repo-sources workaround for transform rules
skius Aug 21, 2023
3ec878e
Debug for CPILULE
skius Aug 21, 2023
299d17a
Debug for CPILASLULE
skius Aug 21, 2023
0cb15c3
addressing first part of review
skius Aug 22, 2023
273cb25
transform aliases parse to either Locale or String, normalize legacy IDs
skius Aug 22, 2023
3b249bb
regenerate testdata
skius Aug 22, 2023
0fa9929
fix tests
skius Aug 22, 2023
6566d63
Merge branch 'main' into datagen-transform-rules
skius Aug 22, 2023
fb3a112
fix license
skius Aug 22, 2023
1f9da83
regenerate baked data
skius Aug 22, 2023
7312e8c
Merge branch 'main' into datagen-transform-rules
skius Aug 22, 2023
bba758c
regenerate testdata
skius Aug 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Squashed commit of the following:
commit 7f84837
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 19:31:59 2023 -0500

    Update request.rs

commit 905e697
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:50:55 2023 -0700

    gn-gen

commit 1e52541
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:50:06 2023 -0700

    Update sizes (probably need to fix nightly sizes too)

commit 45778ce
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:47:06 2023 -0700

    Use the TinyStr variant

commit 04ee1df
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:42:26 2023 -0700

    Remove Default impl

commit 3d83b95
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:42:02 2023 -0700

    Add impls based on Deref

commit 3b14fbd
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 16:10:47 2023 -0700

    Add 3-way enum (not using Stack variant yet)

commit cda1498
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 13:57:26 2023 -0700

    Fix build post merge

commit 71c1087
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 13:56:30 2023 -0700

    Support multipart auxiliary keys

commit f31643a
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 13:17:17 2023 -0700

    Use `+` for aux key

commit dbefbf9
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 13:04:55 2023 -0700

    More more is_und

commit f8e310e
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 13:03:07 2023 -0700

    is_und in driver.rs (post merge cleanup)

commit 33fad72
Merge: 5551d6b ea1ba9f
Author: Shane F. Carr <shane@unicode.org>
Date:   Thu Aug 17 12:58:16 2023 -0700

    Merge branch 'main' into auxkey

    Conflicts:
    	provider/datagen/src/lib.rs

commit 5551d6b
Author: Shane F. Carr <shane@unicode.org>
Date:   Wed Aug 16 17:07:24 2023 -0700

    fmt and refactor

commit cd0d86c
Author: Shane F. Carr <shane@unicode.org>
Date:   Wed Aug 16 17:00:56 2023 -0700

    Fix baked_exporter.rs

commit 241b563
Author: Shane F. Carr <shane@unicode.org>
Date:   Wed Aug 16 14:51:39 2023 -0700

    Pull the separator character into a function where possible.

commit 93f0969
Author: Shane F. Carr <shane@unicode.org>
Date:   Wed Aug 16 14:47:19 2023 -0700

    Handle $ in baked_exporter

commit 3493b75
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 19:24:09 2023 -0700

    Use is_und instead of is_empty more consistently in fallback iterator

commit 52ab723
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 19:19:23 2023 -0700

    Add DataLocale::is_und; gen hello world testdata

commit f36b405
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 19:05:19 2023 -0700

    Use auxiliary keys in HelloWorldProvider

commit 54101ad
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 18:57:06 2023 -0700

    Forbid the empty string in AuxiliaryKey

commit d0e30e2
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 18:54:44 2023 -0700

    Docs, tests, cleanup

commit 06e453c
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 18:29:18 2023 -0700

    Change custom error type to DataError::KeyLocaleSyntax

commit 6cb45f6
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 18:18:00 2023 -0700

    Make comparison operations work

commit 59aa0b4
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 15:55:52 2023 -0700

    Start writing impl FromStr for DataLocale

commit 3d36855
Author: Shane F. Carr <shane@unicode.org>
Date:   Tue Aug 15 13:23:16 2023 -0700

    Initial auxiliary key APIs
  • Loading branch information
skius committed Aug 18, 2023
commit 73e86a84992f183220a5811e86e57aa5f7e21697
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions components/datetime/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -206,9 +206,9 @@ mod tests {
fn check_sizes() {
check_size_of!(5800 | 4632, DateFormatter);
check_size_of!(6792 | 5504, DateTimeFormatter);
check_size_of!(7904 | 6528, ZonedDateTimeFormatter);
check_size_of!(7904 | 6552, ZonedDateTimeFormatter);
check_size_of!(1496 | 1344, TimeFormatter);
check_size_of!(1112 | 1024, TimeZoneFormatter);
check_size_of!(1112 | 1048, TimeZoneFormatter);
check_size_of!(5752 | 4584, TypedDateFormatter::<Gregorian>);
check_size_of!(6744 | 5456, TypedDateTimeFormatter::<Gregorian>);

Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions ffi/gn/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions ffi/gn/icu4x/BUILD.gn
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,7 @@ rust_library("icu_provider-v1_2_0") {
deps += [ ":icu_locid-v1_2_0" ]
deps += [ ":icu_provider_macros-v1_2_0($host_toolchain)" ]
deps += [ ":stable_deref_trait-v1_2_0" ]
deps += [ ":tinystr-v0_7_1" ]
deps += [ ":writeable-v0_5_2" ]
deps += [ ":yoke-v0_7_1" ]
deps += [ ":zerofrom-v0_1_2" ]
Expand Down
2 changes: 1 addition & 1 deletion provider/adapters/src/fallback/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ impl<P> LocaleFallbackProvider<P> {
});
}
// If we just checked und, break out of the loop.
if fallback_iterator.get().is_empty() {
if fallback_iterator.get().is_und() {
break;
}
fallback_iterator.step();
Expand Down
Binary file modified provider/adapters/tests/data/blob.postcard
Binary file not shown.
Binary file modified provider/blob/tests/data/hello_world.postcard
Binary file not shown.
1 change: 1 addition & 0 deletions provider/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ all-features = true
displaydoc = { version = "0.2.3", default-features = false }
icu_locid = { version = "1.2.0", path = "../../components/locid" }
stable_deref_trait = { version = "1.2.0", default-features = false }
tinystr = { version = "0.7.1", path = "../../utils/tinystr" }
writeable = { version = "0.5.1", path = "../../utils/writeable" }
yoke = { version = "0.7.1", path = "../../utils/yoke", features = ["derive"] }
zerofrom = { version = "0.1.1", path = "../../utils/zerofrom", features = ["derive"] }
Expand Down
4 changes: 4 additions & 0 deletions provider/core/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ pub enum DataErrorKind {
#[displaydoc("Invalid state")]
InvalidState,

/// The syntax of the [`DataKey`] or [`DataLocale`] was invalid.
#[displaydoc("Parse error for data key or data locale")]
KeyLocaleSyntax,

/// An unspecified error occurred, such as a Serde error.
///
/// Check debug logs for potentially more information.
Expand Down
29 changes: 24 additions & 5 deletions provider/core/src/hello_world.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,25 @@ impl KeyedDataMarker for HelloWorldV1Marker {
///
/// assert_eq!("Hallo Welt", german_hello_world.get().message);
/// ```
///
/// Load the reverse string using an auxiliary key:
///
/// ```
/// use icu_provider::hello_world::*;
/// use icu_provider::prelude::*;
///
/// let reverse_hello_world: DataPayload<HelloWorldV1Marker> =
/// HelloWorldProvider
/// .load(DataRequest {
/// locale: &"en+reverse".parse().unwrap(),
/// metadata: Default::default(),
/// })
/// .expect("Loading should succeed")
/// .take_payload()
/// .expect("Data should be present");
///
/// assert_eq!("Olleh Dlrow", reverse_hello_world.get().message);
/// ```
#[derive(Debug, PartialEq, Default)]
pub struct HelloWorldProvider;

Expand All @@ -88,11 +107,13 @@ impl HelloWorldProvider {
("de", "Hallo Welt"),
("el", "Καλημέρα κόσμε"),
("en", "Hello World"),
("en+reverse", "Olleh Dlrow"),
("eo", "Saluton, Mondo"),
("fa", "سلام دنیا‎"),
("fi", "hei maailma"),
("is", "Halló, heimur"),
("ja", "こんにちは世界"),
("ja+reverse", "界世はちにんこ"),
("la", "Ave, munde"),
("pt", "Olá, mundo"),
("ro", "Salut, lume"),
Expand Down Expand Up @@ -190,11 +211,7 @@ impl BufferProvider for HelloWorldJsonProvider {
impl icu_provider::datagen::IterableDataProvider<HelloWorldV1Marker> for HelloWorldProvider {
fn supported_locales(&self) -> Result<Vec<DataLocale>, DataError> {
#[allow(clippy::unwrap_used)] // datagen
Ok(Self::DATA
.iter()
.map(|(s, _)| s.parse::<icu_locid::LanguageIdentifier>().unwrap())
.map(DataLocale::from)
.collect())
Ok(Self::DATA.iter().map(|(s, _)| s.parse().unwrap()).collect())
}
}

Expand Down Expand Up @@ -309,11 +326,13 @@ fn test_iter() {
locale!("de").into(),
locale!("el").into(),
locale!("en").into(),
"en+reverse".parse().unwrap(),
locale!("eo").into(),
locale!("fa").into(),
locale!("fi").into(),
locale!("is").into(),
locale!("ja").into(),
"ja+reverse".parse().unwrap(),
locale!("la").into(),
locale!("pt").into(),
locale!("ro").into(),
Expand Down
3 changes: 3 additions & 0 deletions provider/core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ pub use crate::key::DataKey;
pub use crate::key::DataKeyHash;
pub use crate::key::DataKeyMetadata;
pub use crate::key::DataKeyPath;
pub use crate::request::AuxiliaryKeys;
pub use crate::request::DataLocale;
pub use crate::request::DataRequest;
pub use crate::request::DataRequestMetadata;
Expand Down Expand Up @@ -206,6 +207,8 @@ pub mod prelude {
#[doc(no_inline)]
pub use crate::AsDynamicDataProviderAnyMarkerWrap;
#[doc(no_inline)]
pub use crate::AuxiliaryKeys;
#[doc(no_inline)]
pub use crate::BufferMarker;
#[doc(no_inline)]
pub use crate::BufferProvider;
Expand Down
Loading