Skip to content

Error in training using tesseract 5 #3563

Open
@mohammad69h94

Description

Environment

  • Tesseract Version: tesseract 5.0.0-alpha-20210401-123-g5eb2e8
  • Commit Number:
  • Platform: either linux ubuntu 18.4

Current Behavior:

Training command :
./tesstrain.sh --fonts_dir fonts --fontlist 'B Nazanin' --lang fas --linedata_only --langdata_dir langdata_lstm --tessdata_dir tesseract/tessdata --save_box_tiff --maxpages 10 --output_dir train

=== Starting training for language 'fas'
[‫جۆمعه آخشامی ۱۶ سپتامبر ۲۱، ساعات ۱۸:۲۳:۱۸ (+0430)‬] /usr/local/bin/text2image --fonts_dir=fonts --ptsize 12 --font=B Nazanin --outputbase=/tmp/font_tmp.Bi9gSouhv9/sample_text.txt --text=/tmp/font_tmp.Bi9gSouhv9/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Bi9gSouhv9
Stripped 1 unrenderable words
Rendered page 0 to file /tmp/font_tmp.Bi9gSouhv9/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using B Nazanin
[‫جۆمعه آخشامی ۱۶ سپتامبر ۲۱، ساعات ۱۸:۲۳:۱۹ (+0430)‬] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Bi9gSouhv9 --fonts_dir=fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0 --max_pages=10 --font=B Nazanin --ptsize 12 --text=langdata_lstm/fas/fas.training_text
Stripped 36 unrenderable words
Rendered page 0 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 22 unrenderable words
Rendered page 1 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 39 unrenderable words
Rendered page 2 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 33 unrenderable words
Rendered page 3 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 40 unrenderable words
Rendered page 4 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 26 unrenderable words
Error in boxCreate: y < 0 and box off +quad
Rendered page 5 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 28 unrenderable words
Rendered page 6 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 30 unrenderable words
Rendered page 7 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 29 unrenderable words
Rendered page 8 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Stripped 33 unrenderable words
Rendered page 9 to file /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Null box at index 0
Error: Call PrepareToWrite before WriteTesseractBoxFile!!

=== Phase UP: Generating unicharset and unichar properties files ===
[‫جۆمعه آخشامی ۱۶ سپتامبر ۲۱، ساعات ۱۸:۲۳:۲۴ (+0430)‬] /usr/local/bin/unicharset_extractor --output_unicharset /tmp/fas-2021-09-16.MJH/fas.unicharset --norm_mode 2 /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.box
Failed to read data from: /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.box
Wrote unicharset file /tmp/fas-2021-09-16.MJH/fas.unicharset
[‫جۆمعه آخشامی ۱۶ سپتامبر ۲۱، ساعات ۱۸:۲۳:۲۴ (+0430)‬] /usr/local/bin/set_unicharset_properties -U /tmp/fas-2021-09-16.MJH/fas.unicharset -O /tmp/fas-2021-09-16.MJH/fas.unicharset -X /tmp/fas-2021-09-16.MJH/fas.xheights --script_dir=langdata_lstm
Loaded unicharset of size 3 from file /tmp/fas-2021-09-16.MJH/fas.unicharset
Setting unichar properties
Setting script properties
Writing unicharset to file /tmp/fas-2021-09-16.MJH/fas.unicharset

=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=tesseract/tessdata
[‫جۆمعه آخشامی ۱۶ سپتامبر ۲۱، ساعات ۱۸:۲۳:۲۴ (+0430)‬] /usr/local/bin/tesseract /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0 lstm.train
Tesseract Open Source OCR Engine v5.0.0-alpha-20210401-123-g5eb2e8 with Leptonica
Page 1
Failed to read boxes from /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif
Error during processing.
ERROR: Program tesseract failed. Abort. Command line: /usr/local/bin/tesseract /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0.tif /tmp/fas-2021-09-16.MJH/fas.B_Nazanin.exp0 lstm.train

fas-2021-09-16.MJH.zip

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions