Move LB8a and LB9 out of the table #5001
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hopefully no functional change.
Last time I attempted to look at Unicode 15.1 line breaking, that was made impractical by the need, for every new state X, to add an X_ZWJ state, transitions X CM → X, X ZWJ → X_ZWJ, X_ZWJ CM → X, as well as X_ZWJ Y → Z for every transition X Y → Z, and to add or update rules to prevent breaks after X_ZWJ.
Hopefully this will make that upgrade a little more tractable. (Incidentally it makes the state table a bit smaller.)
Tested with 200 000 monkeys (recall that only 200 are checked in).
Related to #3255; see my comment there for the rationale.
Aside: While looking at this, it came to my attention that the
LineBreakStrictness::Anywhere
option does not do what the standard says, cf. https://drafts.csswg.org/css-text-3/#valdef-line-break-anywhere and https://drafts.csswg.org/css-text-3/#typographic-character-unit referenced therein. Of course, we do have a correct implementation ofline-break: anywhere
, since we have a grapheme cluster segmenter.