Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1139: Patch fixing text2image's --only_extract_font_properties mode to print variant kerning spacing #7

Closed
wants to merge 1 commit into from

Conversation

jimregan
Copy link
Contributor

https://code.google.com/p/tesseract-ocr/issues/detail?id=1139

Reported by nick.white@durham.ac.uk, Apr 8, 2014
The ok_count == 2 part of the test always failed, which prevented the kerning variations from ever being printed. As far as I can tell
it wasn't doing anything useful, but my C++ is (still) rusty, so
some light checking would be wise.

The patch also updates the function description to be in line with the current output.

Apr 24, 2014
#1 theraysmith

This could significantly change results after training.
Going to apply this change in a carefully controlled environment.

Apr 25, 2014
#2 nick.white@durham.ac.uk

This patch did make the example in the comments (A+V) output as expected, but it doesn't seem to have affected the fact that no kerning variations are printed for any ancient greek digrams. I'm not sure why, whether it's the ancient greek font, or there is something still not right about the code.

(http://web.archive.org/web/20150510031850/https://code.google.com/p/tesseract-ocr/issues/detail?id=1139)

… kerning spacing

The ok_count == 2 part of the test always failed, which prevented
the kerning variations from ever being printed. As far as I can tell
it wasn't doing anything useful, but my C++ is (still) rusty, so
some light checking would be wise.

Also updates the function description to be in line with the
current output
@amitdo
Copy link
Collaborator

amitdo commented Apr 28, 2017

@theraysmith, should we close this PR?

@theraysmith
Copy link
Contributor

I think so. Even the legacy engine didn't use that data much, the new engine doesn't use it at all.

@amitdo
Copy link
Collaborator

amitdo commented Apr 30, 2017

@zdenop, please close this PR.

@zdenop zdenop closed this Apr 30, 2017
stweil referenced this pull request in stweil/tesseract May 20, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
noahmetzger pushed a commit to noahmetzger/tesseract that referenced this pull request Jul 31, 2018
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
tesseract-ocr#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
tesseract-ocr#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
tesseract-ocr#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
tesseract-ocr#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
tesseract-ocr#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
tesseract-ocr#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
tesseract-ocr#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
tesseract-ocr#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
tesseract-ocr#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants