-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognize with monitor crashes on specific file #1449
Comments
Please also try with latest code from GitHub and check if crash happens in
same place.
…On Tue 3 Apr, 2018, 12:46 PM oleg-st, ***@***.***> wrote:
Environment
- *Tesseract Version*: tesseract 4.00.00alpha
- *Commit Number*: f8e26ee
<f8e26ee>
- *Platform*: Linux osboxes 3.10.0-693.17.1.el7.x86_64 #1
<#1> SMP Thu Jan 25
20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Current Behavior:
Crash in api->Recognize(&monitor) with specific image file.
start >= 0 && start + num <= length_:Error:Assert failed:in file ratngs.cpp, line 324
Segmentation fault (core dumped)
Expected Behavior:
No crash.
Code to reproduce:
#include "tesseract/baseapi.h"
#include "tesseract/genericvector.h"
#include "tesseract/renderer.h"
#include "tesseract/ocrclass.h"
#include <leptonica/allheaders.h>
int main()
{
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
ETEXT_DESC monitor;
// Initialize tesseract-ocr with Russian, without specifying tessdata path
if (api->Init(NULL, "rus")) {
fprintf(stderr, "Could not initialize tesseract.\n");
return 1;
}
// Open input image with leptonica library
Pix *image = pixRead("test.jpg");
api->SetImage(image);
// Recognize (should crash here)
api->Recognize(&monitor);
api->End();
return 0;
}
test.jpg:
https://user-images.githubusercontent.com/7984580/38233906-d99b50b8-3724-11e8-95a2-86b18d88b034.jpg
Suggested Fix:
Backtrace:
#0 ERRCODE::error ***@***.***=0x7ffff7d8a640 <_ZL13ASSERT_FAILED>,
***@***.***=0x7ffff789e8f0 "start >= 0 && start + num <= length_", ***@***.***=ABORT,
***@***.***=0x7ffff787eeb4 "in file %s, line %d") at errcode.cpp:86
#1 0x00007ffff7855581 in WERD_CHOICE::remove_unichar_ids (this=0x135d800, ***@***.***=1, ***@***.***=1) at ratngs.cpp:324
#2 0x00007ffff784850d in remove_unichar_id (index=1, this=<optimized out>) at ratngs.h:481
#3 WERD_RES::MergeAdjacentBlobs ***@***.***=0x20979f0, ***@***.***=0) at pageres.cpp:972
#4 0x00007ffff76db80a in tesseract::Tesseract::write_results ***@***.***=0x7ffff7fb0010, page_res_it=...,
newline_type=<optimized out>, ***@***.***=0 '\000') at output.cpp:192
#5 0x00007ffff76dbdb3 in tesseract::Tesseract::output_pass ***@***.***=0x7ffff7fb0010, page_res_it=...,
***@***.***=0x0) at output.cpp:112
#6 0x00007ffff76c4968 in tesseract::Tesseract::recog_all_words (this=0x7ffff7fb0010, page_res=0x20a7e40,
***@***.***=0x7fffffffe280, ***@***.***=0x0, ***@***.***=0x0,
***@***.***=0) at control.cpp:425
#7 0x00007ffff76af41c in tesseract::TessBaseAPI::Recognize (this=0x8059d0, monitor=0x7fffffffe280) at baseapi.cpp:869
#8 0x0000000000400b40 in main ()
*Tesseract crashes only if Recognize called with monitor parameter not
null*.
monitor parameter affects because of this condition:
control.cpp:
// changed by jetsoft
// needed for dll to output memory structure
if ((dopasses == 0 || dopasses == 2) && (monitor || tessedit_write_unlv))
output_pass(page_res_it, target_word_box);
// end jetsoft
My suggestion is to move this code block below few lines (after "Remove
empty words, as these mess up the result iterators.").
I think there are empty words in page_res_it and it causes crash in
output_pass.
After this change there is no more crashes.
// Write results pass.
set_global_loc_code(LOC_WRITE_RESULTS);
// This is now redundant, but retained commented so show how to obtain
// bounding boxes and style information.
PageSegMode pageseg_mode = static_cast<PageSegMode>(
static_cast<int>(tessedit_pageseg_mode));
textord_.CleanupSingleRowResult(pageseg_mode, page_res);
// Remove empty words, as these mess up the result iterators.
for (page_res_it.restart_page(); page_res_it.word() != NULL;
page_res_it.forward()) {
WERD_RES* word = page_res_it.word();
POLY_BLOCK* pb = page_res_it.block()->block != NULL
? page_res_it.block()->block->poly_block()
: NULL;
if (word->best_choice == NULL || word->best_choice->length() == 0 ||
(word->best_choice->IsAllSpaces() && (pb == NULL || pb->IsText()))) {
page_res_it.DeleteCurrentWord();
}
}
// changed by jetsoft
// needed for dll to output memory structure
if ((dopasses == 0 || dopasses == 2) && (monitor || tessedit_write_unlv))
output_pass(page_res_it, target_word_box);
// end jetsoft
if (monitor != NULL) {
monitor->progress = 100;
}
return true;
}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1449>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AE2_o1N2hvFGoQwu4bES22ZQ6jSz8g5lks5tkyHEgaJpZM4TEkOa>
.
|
Same behavior detected with latest code from GitHub |
Duplicate of #989. |
However It's not fixed in #989 |
Did you try to do something similar to: |
Yes, it doesn't help. Crash occurs with any monitor (not null). |
Any news? |
@oleg-st : I am not able to reproduce it with current code on openSUSE 15 64bit. I slightly modified your code and it works for me: Pix *image = pixRead("test.jpg");
api->SetImage(image);
api->Recognize(&monitor);
char* outText = api->GetUTF8Text();
printf("OCR output:\n%s", outText);
if (outText)
delete [] outText; I also tried other "Russian" image from issue 1912 and it works for me. |
Probably duplicate of #948. |
It's tessdata_best I will try to reproduce with latest code. |
@zdenop Code (slightly modified): #include "tesseract/baseapi.h"
#include "tesseract/ocrclass.h"
#include <leptonica/allheaders.h>
int main()
{
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
ETEXT_DESC* monitor = new ETEXT_DESC();
// Initialize tesseract-ocr with Russian, without specifying tessdata path
if (api->Init(NULL, "rus")) {
fprintf(stderr, "Could not initialize tesseract.\n");
return 1;
}
Pix *image = pixRead("test.jpg");
api->SetImage(image);
api->Recognize(monitor);
char* outText = api->GetUTF8Text();
printf("OCR output:\n%s", outText);
if (outText)
delete [] outText;
api->End();
delete monitor;
return 0;
} Output:
|
Try this advice: |
@amitdo It helped. |
The problem is that the unlv renderer needs it. Changing Setting |
Environment
Current Behavior:
Crash in api->Recognize(&monitor) with specific image file.
Expected Behavior:
No crash.
Code to reproduce:
test.jpg:
https://user-images.githubusercontent.com/7984580/38233906-d99b50b8-3724-11e8-95a2-86b18d88b034.jpg
Suggested Fix:
Backtrace:
Tesseract crashes only if Recognize called with monitor parameter not null.
monitor parameter affects because of this condition:
control.cpp:
My suggestion is to move this code block below few lines (after "Remove empty words, as these mess up the result iterators.").
I think there are empty words in
page_res_it
and it causes crash inoutput_pass
.After this change there is no more crashes.
The text was updated successfully, but these errors were encountered: