Skip to content

Problem with TTFGlyph caching for Arabic Presentation Forms  #342

Open
@IshmaZX82

Description

Inside the library, glyph information is stored in objects of type TTFGlyph(id, codePoint, font).
For optimization, these objects are cached by glyph number.
And here is where the problem arises for Arabic text.

Arabic symbols are in two Unicode zones:

  • Arabic Block (U+0600 - U+06FF) - here are just symbols
  • Arabic Presentation Forms-A Block (U+FB50 - U+FDFF) - here are the forms of symbols (Isolated, Initial, Middle, Final).

For example, the symbol "Arabic Letter Peh" with the code U+067E and "Arabic Letter Peh Isolated Form" with the code U+FB56 should look the same and therefore have the same glyph (at least in the font "NotoNaskhArabic.ttf").
The TTFGlyph structure for the glyph is created on the first access to this glyph and then cached.
Processing of the symbol forms is performed after that according to the codePoint from the structure TTFGlyph.

But the forms conversion table only contain character codes from the main block.
So if the first character for a glyph is a character from the "Presentation Forms" block, it will cache this character, and forms conversion will no longer be performed for character from the main block.

Examples

Input data: "\u067E\u0646 \r\n \uFB56\uFEE5",

image

Input data: "\uFB56\uFEE5 \r\n \u067E\u0646"

image

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions