Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert failed #989

Closed
jorgeflorez opened this issue Jun 14, 2017 · 7 comments
Closed

Assert failed #989

jorgeflorez opened this issue Jun 14, 2017 · 7 comments

Comments

@jorgeflorez
Copy link

jorgeflorez commented Jun 14, 2017

Hello,
I was trying to perform OCR using tesserocr wrapper (python)
and this code:

from tesserocr import PyTessBaseAPI
from PIL import Image

img = Image.open("/home/jorgeflorez/Downloads/c1.png")
tesseract = PyTessBaseAPI(lang="spa")
tesseract.SetImage(img)
tesseract.Recognize()
print tesseract.GetUTF8Text()   
tesseract.End()

For the image I tried I got this in console:
start >= 0 && start + num <= length_:Error:Assert failed:in file ratngs.cpp, line 321

I reported the problem and talked to the creator of the wrapper and he provided me with a workaround to make it work

from tesserocr import PyTessBaseAPI
from PIL import Image

filePath = "/home/jorgeflorez/Downloads/c1.png"

tesseract = PyTessBaseAPI(lang="spa")
tesseract.SetImageFile("/home/jorgeflorez/Downloads/c1.png")
tesseract.Recognize()
print tesseract.GetUTF8Text()

He thinks that most likely the problem is in tesseract and not in pillow. So here I am trying to let you know :)

This is what I am using
Linux localhost.localdomain 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

tesseract 3.05.01
leptonica-1.74.1
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7

And this is the image

I am not very familiar with the C++ API, otherwise I would create a program and try to reproduce it with it and be sure this is a tesseract issue. Thanks in advance.

Best Regards.
Jorge

@sirfz
Copy link

sirfz commented Jun 17, 2017

Reference issue in tesserocr: sirfz/tesserocr/issues/55

@amitdo
Copy link
Collaborator

amitdo commented Jun 18, 2017

@sirfz, can you provide a C++ example program that reproduces the issue?

@sirfz
Copy link

sirfz commented Jun 18, 2017

Not tested but it should look something like (based on the basic API example):

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    ETEXT_DESC monitor;

    // Initialize tesseract-ocr with Spanish, without specifying tessdata path
    if (api->Init(NULL, "spa")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("pil_image.png");
    api->SetImage(image);
    // Recognize (should crash here)
    api->Recognize(&monitor);

    return 0;
}

Here's the image saved using PIL:

pil_saved

@sirfz
Copy link

sirfz commented Jun 18, 2017

After writing the code above, I modified tesserocr to call Recognize(NULL) instead of Recognize(&monitor) and it no longer crashes. For some reason it's triggering on some images but I guess my current usage is wrong and I should use NULL instead when there's no monitor?

@amitdo
Copy link
Collaborator

amitdo commented Jun 18, 2017

but I guess my current usage is wrong and I should use NULL instead when there's no monitor?

Yes.

If you want to use monitor, see here:
https://github.com/tesseract-ocr/tesseract/blob/8aa0a2dd48/api/baseapi.cpp#L1173

@sirfz
Copy link

sirfz commented Jun 18, 2017

Thanks @amitdo for pointing to a usage example, will definitely add the timeout feature to tesserocr 👍

@amitdo
Copy link
Collaborator

amitdo commented Jun 21, 2017

The original issue was fixed by @sirfz in his tesserocr, so this issue should be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants