Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unittest: Fix Thai valid text and add Thai illegal sequences #2455

Merged
merged 2 commits into from
May 25, 2019
Merged

Unittest: Fix Thai valid text and add Thai illegal sequences #2455

merged 2 commits into from
May 25, 2019

Conversation

bact
Copy link
Contributor

@bact bact commented May 22, 2019

Fix a typo in "valid text" kScriptText

  • Replace the word "ท่ำ" with "ทำ"
  • Replace the word "ธปเทียน" with "ธูปเทียน"

Add two illegal sequence in kBadlyFormedThaiWords

The code was:

const char* kBadlyFormedThaiWords[] = {"ฤิ", "กา้ํ", "กิำ"};

proposed:

const char* kBadlyFormedThaiWords[] = {"ฤิ", "กา้ํ", "กิำ", "นำ้", "เเก"};

First one added is an illegal sequence of "น้ำ" (water)

The legal sequence for is

  • (0E19) + ้ (0E49) + ำ (0E33)

but sometimes found in this sequence (wrong order, and sometimes can be rendered visually similar)

  • (0E19) + ำ (0E33) + ้ (0E49)

Second one added is an illegal sequence of "แก" (you)

The legal sequence for is

  • (0E41) + ก (0E01)

sometimes found in this sequence (use two 0E40s instead of 0E41, visually similar)

  • (0E40) + เ (0E40) + ก (0E01)

bact added 2 commits May 22, 2019 15:19
- Fix a invalid sequence in "valid text" `kScriptText`
- Add two illegal sequence in `kBadlyFormedThaiWords`
@bact bact changed the title Fix Thai valid text and add Thai illegal sequences Unittest: Fix Thai valid text and add Thai illegal sequences May 22, 2019
@zdenop
Copy link
Contributor

zdenop commented May 25, 2019

@Shreeshrii : Are you able to check this PR?

@Shreeshrii
Copy link
Collaborator

@zdenop I do not know Thai language/script.

@bact
Please describe the changes you have made so that non-Thai speakers can understand.
Explain what fix you have made in Thai valid text and Thai illegal sequences.
Thanks.

@bact
Copy link
Contributor Author

bact commented May 25, 2019

@zdenop @Shreeshrii thank you. I edited the descrption above, added the explanation.

@zdenop zdenop merged commit 12847d5 into tesseract-ocr:master May 25, 2019
@zdenop
Copy link
Contributor

zdenop commented May 25, 2019

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants