-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Petition the spec to consider non-ascii word boundaries #1225
Comments
Also worth noting that it's probably easy enough to make a Cargo feature for opting into this (currently not spec-compliant) behavior. |
Should be worth noting that Unicode supports word boundaries; https://unicode.org/reports/tr29/ |
The regex crate also links to this part of Unicode about regexes: https://www.unicode.org/reports/tr18/#Compatibility_Properties I think that asking this change for the spec we have to link to a spec of some kind (like these from unicode), instead of just saying "non-ASCII". It should also be noted that it looks like the current Synapse implementation already uses unicode word boundaries. |
Could you look at how long synapse has used that word boundary definition? Then it might be possible it’s eligible to be grandfathered into the spec, instead of requiring an MSC. |
It looks like this commit from January 5th made it support unicode. There was a comment about it only supporting ASCII before although I'm not sure why because the regex didn't change much between the two versions. Maybe it also changed with a Python version change? |
Originally posted by @jplatte in #1224 (review)
I'm noting this in a separate issue, to keep these thoughts organised, to address them at a later date.
The text was updated successfully, but these errors were encountered: