Problem with TTFGlyph caching for Arabic Presentation Forms #342
Description
Inside the library, glyph information is stored in objects of type TTFGlyph(id, codePoint, font).
For optimization, these objects are cached by glyph number.
And here is where the problem arises for Arabic text.
Arabic symbols are in two Unicode zones:
- Arabic Block (U+0600 - U+06FF) - here are just symbols
- Arabic Presentation Forms-A Block (U+FB50 - U+FDFF) - here are the forms of symbols (Isolated, Initial, Middle, Final).
For example, the symbol "Arabic Letter Peh" with the code U+067E and "Arabic Letter Peh Isolated Form" with the code U+FB56 should look the same and therefore have the same glyph (at least in the font "NotoNaskhArabic.ttf").
The TTFGlyph structure for the glyph is created on the first access to this glyph and then cached.
Processing of the symbol forms is performed after that according to the codePoint from the structure TTFGlyph.
But the forms conversion table only contain character codes from the main block.
So if the first character for a glyph is a character from the "Presentation Forms" block, it will cache this character, and forms conversion will no longer be performed for character from the main block.
Examples
Input data: "\u067E\u0646 \r\n \uFB56\uFEE5",
Input data: "\uFB56\uFEE5 \r\n \u067E\u0646"