Petition the spec to consider non-ascii word boundaries #1225

ShadowJonathan · 2022-06-24T13:36:54Z

I really don't like all of this stuff being defined on ASCII, it degrades the experience for non-English usage (especially where the language used doesn't use a variation of the latin alphabet).

But that's what the spec says and just deviating from it w/o trying to change it is also pretty bad and would be against existing policy.

Originally posted by @jplatte in #1224 (review)

I'm noting this in a separate issue, to keep these thoughts organised, to address them at a later date.

jplatte · 2022-06-24T13:43:22Z

Also worth noting that it's probably easy enough to make a Cargo feature for opting into this (currently not spec-compliant) behavior.

ShadowJonathan · 2022-06-24T16:11:11Z

Should be worth noting that Unicode supports word boundaries; https://unicode.org/reports/tr29/

zecakeh · 2022-06-24T16:14:10Z

The regex crate also links to this part of Unicode about regexes: https://www.unicode.org/reports/tr18/#Compatibility_Properties

I think that asking this change for the spec we have to link to a spec of some kind (like these from unicode), instead of just saying "non-ASCII".

It should also be noted that it looks like the current Synapse implementation already uses unicode word boundaries.

ShadowJonathan · 2022-06-24T16:29:21Z

Could you look at how long synapse has used that word boundary definition? Then it might be possible it’s eligible to be grandfathered into the spec, instead of requiring an MSC.

zecakeh · 2022-06-24T16:44:33Z

It looks like this commit from January 5th made it support unicode.

There was a comment about it only supporting ASCII before although I'm not sure why because the regex didn't change much between the two versions. Maybe it also changed with a Python version change?

jplatte changed the title ~~Petition the spec to consider non-ascii word boundries~~ Petition the spec to consider non-ascii word boundaries Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Petition the spec to consider non-ascii word boundaries #1225

Petition the spec to consider non-ascii word boundaries #1225

ShadowJonathan commented Jun 24, 2022

jplatte commented Jun 24, 2022

ShadowJonathan commented Jun 24, 2022

zecakeh commented Jun 24, 2022 •

edited

Loading

ShadowJonathan commented Jun 24, 2022

zecakeh commented Jun 24, 2022

Petition the spec to consider non-ascii word boundaries #1225

Petition the spec to consider non-ascii word boundaries #1225

Comments

ShadowJonathan commented Jun 24, 2022

jplatte commented Jun 24, 2022

ShadowJonathan commented Jun 24, 2022

zecakeh commented Jun 24, 2022 • edited Loading

ShadowJonathan commented Jun 24, 2022

zecakeh commented Jun 24, 2022

zecakeh commented Jun 24, 2022 •

edited

Loading