-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: Training: Deserialize Failed #792
Comments
Lines with error:
|
This is still happening with the latest code - the eval file was built using a different training text. git log -1
|
@stweil Is it related to #881 (comment) ? |
Maybe, I don't know. First I have to reproduce this. |
@stweil In case you want to reproduce using the files I was using - they are for Bihari/Hindi language, devanagari script. http://sanskritdocuments.org/hindi/bihtest.zip Zip file was too large to upload here or on my github account. |
|
I was using the Hindi traineddata from the tessdata repo as the basis for training. |
http://sanskritdocuments.org/hindi/bihnewlayer.zip it has:
I have not included all other intermediate .lstm files, since each is 33+ MB. |
I had to fix the path in |
Thanks for checking, @stweil. I will rebuild with the latest git master and test again as I am still getting Maybe, these are also just 'info' messages! At iteration 8306/10800/10808, Mean rms=0.765%, delta=1.311%, char train=4.216%, word train=10.221%, skip ratio=0.2%, New best char error = 4.216 wrote best model:/ho |
Did you need to change path in both to match your setup?
or did you change path in bihtest to match bihnew? I think the problem occurs when the training files and evaluation files are different. The lstmf files in bihnew and bihtest were created using different training texts and font combos. |
Yes, I changed both files to match my home directory. |
Thanks, @stweil . However, I can now reproduce the error that you got when using lstmf files from a different version. @theraysmith I am getting these errors when I use lstmf files created before the commits regarding endianness. However, the location of error is different from the rest reported above in this thread.
Deserialize failed: /home/shree/tesstutorial/nyd/eng.1852nydir.exp1.lstmf read 0/8 pages |
I tried running using the latest code with enable-debug option - earlier the error was 'deserialize failed' - please see #792 (comment) With --enable-debug, I get core dumped (same as #561)
gdb output pasted below:
@stweil was not able to reproduce this - #792 (comment) The only difference I can see would be that I run the program under WSL (bash on windows 10). |
Duplicate/Related #644 |
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion (see issues tesseract-ocr#644, tesseract-ocr#792). The new test in LSTMTrainer::ReadTrainingDump was added to improve the robustness of the code. Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion (see issues tesseract-ocr#644, tesseract-ocr#792). The new test in LSTMTrainer::ReadTrainingDump was added to improve the robustness of the code. Signed-off-by: Stefan Weil <sw@weilnetz.de>
Still getting the error:
ref: https://travis-ci.org/Shreeshrii/tess4train/builds/252343478 |
Is this issue still present with the latest code? |
Closing Issue since LSTM training process has changed and so it is difficult to duplicate the issue. |
I added -eval_listfile /home/shree/tesstutorial/hineval/tmp.txt \ to my lstmtraining command once it had come down to less than 3%char error.
While training is continuing, I am getting messages saying 'Deserialize Failed',
The text was updated successfully, but these errors were encountered: