LSTM: Training - Eval not run from trainer #644

Shreeshrii · 2017-01-09T02:56:53Z

See comment from Ray at
#542 (comment)

it doesn't run the eval from the trainer.
It doesn't harm the tutorial, but will be required before people start serious training.

xlight · 2017-04-11T07:29:00Z

with --eval_listfiles , always coredump. 😭

lstmtraining: ../ccutil/genericvector.h:696: T& GenericVector::operator const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Shreeshrii · 2017-05-20T04:43:07Z

@theraysmith Any update on this?

Shreeshrii · 2017-05-29T09:19:01Z

@stweil Have you come across this problem while training? Any solution?

stweil · 2017-05-29T20:38:40Z

No, I did not see that assertion up to now. How can it be reproduced?

Shreeshrii · 2017-05-30T04:24:46Z

Please try the following making appropriate changes for your fonts directory and tessdata directory

training/tesstrain.sh \
  --fonts_dir /c/Windows/Fonts \
  --tessdata_dir ./tessdata \
  --training_text ../langdata/ara/ara.training_text \
  --langdata_dir ../langdata \
  --lang ara  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --fontlist "Arial" \
  --output_dir ~/tesstutorial/aratest
  
training/tesstrain.sh \
  --fonts_dir /c/Windows/Fonts \
  --tessdata_dir ./tessdata \
  --training_text ../langdata/ara/ara.training_text \
  --langdata_dir ../langdata \
  --lang ara  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --fontlist "Arial" \
  "Arial Unicode MS" \
  "Calibri" \
  "Courier New" \
  --output_dir ~/tesstutorial/araeval

mkdir -p ~/tesstutorial/aratuned_from_aratest 

combine_tessdata -e ../tessdata/ara.traineddata \
  ~/tesstutorial/aratuned_from_aratest/ara.lstm
  
lstmtraining \
  --continue_from ~/tesstutorial/aratuned_from_aratest/ara.lstm \
  --train_listfile ~/tesstutorial/aratest/ara.training_files.txt \
  --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt \
  --model_output ~/tesstutorial/aratuned_from_aratest/aratuned \
  --target_error_rate 0.01

On windows using the binaries from appveyor artifacts, I am getting

$ lstmtraining \
>   --continue_from ~/tesstutorial/aratuned_from_aratest/ara.lstm \
>   --train_listfile ~/tesstutorial/aratest/ara.training_files.txt \
>   --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt \
>   --model_output ~/tesstutorial/aratuned_from_aratest/aratuned \
>   --target_error_rate 0.01
Segmentation fault

I will check with tesseract built under WSL and report separately.

Shreeshrii · 2017-05-30T05:10:41Z

I think crash happens in builds with debug. Non-debug builds get some error messages but continue.

stweil · 2017-05-30T05:25:43Z

The non-debug build continues, but uses a bad index internally, so the results are invalid.

Replace assert (which is only used in debug builds) by ASSERT_HOST in ccutil/genericvector.h, then the non-debug build should show an error, too.

Can you get a stack trace for the assertion?

Shreeshrii · 2017-05-30T07:52:36Z

(gdb) run
Starting program: /usr/local/bin/lstmtraining --continue_from /home/shree/tesstutorial/aratuned_from_aratest/ara.lstm --train_listfile /home/shree/tesstutorial/aratest/a
ra.training_files.txt --eval_listfile /home/shree/tesstutorial/araeval/ara.training_files.txt --model_output /home/shree/tesstutorial/aratuned_from_aratest/aratuned --ta
rget_error_rate 0.01
warning: Error disabling address space randomization: Success
warning: linux_ptrace_test_ret_to_nx: PTRACE_KILL waitpid returned -1: Interrupted system call
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded file /home/shree/tesstutorial/aratuned_from_aratest/aratuned_checkpoint, unpacking...
Successfully restored trainer from /home/shree/tesstutorial/aratuned_from_aratest/aratuned_checkpoint
[New Thread 0x7f4e34110700 (LWP 24)]
Loaded 81/81 pages (1-81) of document /home/shree/tesstutorial/aratest/ara.Arial.exp0.lstmf
[Thread 0x7f4e34110700 (LWP 24) exited]
[New Thread 0x7f4e34110700 (LWP 25)]
Loaded 81/81 pages (1-81) of document /home/shree/tesstutorial/araeval/ara.Arial.exp0.lstmf
[Thread 0x7f4e34110700 (LWP 25) exited]
[New Thread 0x7f4e33900700 (LWP 26)]
Loaded 81/81 pages (1-81) of document /home/shree/tesstutorial/araeval/ara.Arial_Unicode_MS.exp0.lstmf
[Thread 0x7f4e33900700 (LWP 26) exited]
[New Thread 0x7f4e33900700 (LWP 27)]
[New Thread 0x7f4e34110700 (LWP 28)]
[New Thread 0x7f4e325c0700 (LWP 29)]
[New Thread 0x7f4e315a0700 (LWP 30)]
lstmtraining: ../ccutil/genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f4e315a0700 (LWP 30)]
0x00007f4e36ff6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007f4e36ff6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4e36ffa028 in __GI_abort () at abort.c:89
#2  0x00007f4e36fefbf6 in __assert_fail_base (fmt=0x7f4e371403b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x4f48b8 "index >= 0 && index < size_used_", file=file@entry=0x4f489a "../ccutil/genericvector.h", line=line@entry=713,
    function=function@entry=0x4f55a0 <GenericVector<char>::operator[](int) const::__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]")
    at assert.c:92
#3  0x00007f4e36fefca2 in __GI___assert_fail (assertion=0x4f48b8 "index >= 0 && index < size_used_", file=0x4f489a "../ccutil/genericvector.h", line=713,
    function=0x4f55a0 <GenericVector<char>::operator[](int) const::__PRETTY_FUNCTION__> "T& GenericVector<T>::operator[](int) const [with T = char]") at assert.c:101
#4  0x0000000000406bef in GenericVector<char>::operator[] (this=0x7ffffd882960, index=0) at ../ccutil/genericvector.h:713
#5  0x000000000043d308 in tesseract::LSTMTrainer::ReadTrainingDump (this=0x7f4e3159f650, data=..., trainer=0x7f4e3159f650) at lstmtrainer.cpp:921
#6  0x000000000040b8cb in tesseract::LSTMTester::RunEvalSync (this=0x7ffffd8828f0, iteration=24, training_errors=0x7ffffd882fb8, model_data=..., training_stage=1)
    at lstmtester.cpp:86
#7  0x000000000040bc2f in tesseract::LSTMTester::ThreadFunc (lstmtester_void=0x7ffffd8828f0) at lstmtester.cpp:123
#8  0x00007f4e37de8184 in start_thread (arg=0x7f4e315a0700) at pthread_create.c:312
#9  0x00007f4e370ba37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Shreeshrii · 2017-05-30T08:55:31Z

@stweil Unrelated to this issue, one of the recent commits seems to have caused the program to slow down a lot - while training it seems to hang for a while and even OCRing images seems to take longer. I am not sure how to verify/confirm this.

amitdo · 2017-05-30T10:18:34Z

@Shreeshrii, please open a new issue about that regression.

Shreeshrii · 2017-05-31T03:15:38Z

@stweil I don't see any patch here.

Shreeshrii · 2017-05-31T03:20:42Z

#6 0x000000000040b8cb in tesseract::LSTMTester::RunEvalSync (this=0x7ffff40f2d00, iteration=0, training_errors=0x7ffff40f33c8, model_data=..., training_stage=0) at lstmtester.cpp:86

Should the model_data here be pointing to --eval_listfile or lstmf files within it?

stweil · 2017-05-31T04:18:22Z

@Shreeshrii, I am sorry, but somehow my last comment got lost. So once again:

The assertions are caused by an index of 0 used for an empty vector. Since commit 907de59 the constructor of GenericVector no longer allocates memory for an empty vector. tesseract::LSTMTrainer::ReadTrainingDump tries to read 0 bytes. Issue #561 is identical. Here is a patch which fixes the assertion / core dump:

diff --git a/lstm/lstmtrainer.cpp b/lstm/lstmtrainer.cpp
index 03619969..e6d8e4a3 100644
--- a/lstm/lstmtrainer.cpp
+++ b/lstm/lstmtrainer.cpp
@@ -918,6 +918,7 @@ bool LSTMTrainer::SaveTrainingDump(SerializeAmount serialize_amount,
 // Reads previously saved trainer from memory.
 bool LSTMTrainer::ReadTrainingDump(const GenericVector<char>& data,
                                    LSTMTrainer* trainer) {
+  if (data.size() == 0) return false;
   return trainer->ReadSizedTrainingDump(&data[0], data.size());
 }

I don't know whether returning true would be better.

stweil · 2017-05-31T05:38:51Z

I tried to analyze the problem ("why is there a data size 0?") and noticed that the behavior of the program is totally erratic when I run it in a debugger. There are several threads involved, and depending on my breakpoints the problem with the data size occurs or not. That looks like a synchronization issue, and so I decided to use Valgrind. The result is horrible:

==2035== ERROR SUMMARY: 44846954 errors from 905 contexts (suppressed: 0 from 0)

Here is one example (total log file is about 28000 lines):

==2035== Conditional jump or move depends on uninitialised value(s)
==2035==    at 0x1497ED: ComputeBlackWhite (networkio.cpp:140)
==2035==    by 0x1497ED: tesseract::NetworkIO::FromPixes(tesseract::StaticShape const&, std::vector<Pix const*, std::allocator<Pix const*> > const&, tesseract::TRand*) (network
io.cpp:190)
==2035==    by 0x1498FF: tesseract::NetworkIO::FromPix(tesseract::StaticShape const&, Pix const*, tesseract::TRand*) (networkio.cpp:164)
==2035==    by 0x1BBB41: tesseract::Input::PreparePixInput(tesseract::StaticShape const&, Pix const*, tesseract::TRand*, tesseract::NetworkIO*) (input.cpp:149)
==2035==    by 0x1C9842: tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*) (lstmrecognizer.cpp:281)
==2035==    by 0x13DE72: tesseract::LSTMTrainer::PrepareForBackward(tesseract::ImageData const*, tesseract::NetworkIO*, tesseract::NetworkIO*) (lstmtrainer.cpp:855)
==2035==    by 0x13E5E3: tesseract::LSTMTrainer::TrainOnLine(tesseract::ImageData const*, bool) (lstmtrainer.cpp:798)
==2035==    by 0x11290E: TrainOnLine (lstmtrainer.h:273)
==2035==    by 0x11290E: main (lstmtraining.cpp:194)
==2035==  Uninitialised value was created by a heap allocation
==2035==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==2035==    by 0x72052A0: pixCreateNoInit (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x72053F6: pixCreateTemplateNoInit (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x7208917: pixCopyBorder (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x71921EF: pixUnsharpMaskingGray2D (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x7192D99: pixUnsharpMaskingFast (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x7193014: pixUnsharpMasking (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x728C454: pixScaleGeneral (in /usr/lib/x86_64-linux-gnu/liblept.so.5.0.1)
==2035==    by 0x1663DF: tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const (imagedata.cpp:245)
==2035==    by 0x1BBA33: tesseract::Input::PrepareLSTMInputs(tesseract::ImageData const&, tesseract::Network const*, int, tesseract::TRand*, float*) (input.cpp:95)
==2035==    by 0x1C97A1: tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*) (lstmrecognizer.cpp:265)
==2035==    by 0x13DE72: tesseract::LSTMTrainer::PrepareForBackward(tesseract::ImageData const*, tesseract::NetworkIO*, tesseract::NetworkIO*) (lstmtrainer.cpp:855)

So some more work is needed. The patch shown above does not fix the real problems.

Shreeshrii · 2017-05-31T07:00:38Z

@stweil Thank you for looking into this.

Shreeshrii · 2017-05-31T09:42:39Z

#956

may also be related.

The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion (see issues tesseract-ocr#644, tesseract-ocr#792). The new test in LSTMTrainer::ReadTrainingDump was added to improve the robustness of the code. Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>

The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion (see issues tesseract-ocr#644, tesseract-ocr#792). The new test in LSTMTrainer::ReadTrainingDump was added to improve the robustness of the code. Signed-off-by: Stefan Weil <sw@weilnetz.de>

stweil · 2017-06-04T21:02:22Z

It looks like the pix which was created by pixScaleGeneral contains uninitialized pixel data.
The Tesseract function ComputeBlackWhite operates on that data with random results.
@DanBloomberg, do you have any idea how to fix that?

==14279==  Uninitialised value was created by a heap allocation
==14279==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==14279==    by 0x71F8BD1: pix_malloc (pix1.c:234)
==14279==    by 0x71F8BD1: pixCreateNoInit (pix1.c:343)
==14279==    by 0x71F8D06: pixCreateTemplateNoInit (pix1.c:408)
==14279==    by 0x71FBF47: pixCopyBorder (pix2.c:1705)
==14279==    by 0x718DE46: pixUnsharpMaskingGray2D (enhance.c:1346)
==14279==    by 0x718EA29: pixUnsharpMaskingFast (enhance.c:1095)
==14279==    by 0x718EC8C: pixUnsharpMasking (enhance.c:917)
==14279==    by 0x727A4B4: pixScaleGeneral (scale.c:368)
==14279==    by 0x1804D9: tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const (imagedata.cpp:245)

DanBloomberg · 2017-06-04T22:39:51Z

Using valgrind, I am not able to find uninitialized pixels starting from pixScale (or pixScaleGeneral). I did something simple like:
pix3 = pixScale(pix2, 0.6, 0.6); // pix3 is 8 bpp
pixGetDimensions(pix3, &w, &h, NULL);
for (i = 0; i < h; i++) {
for (j = 0; j < w; j++) {
pixGetPixel(pix3, j, i, &val);
tot += val;
}
}
where every pixel was read and used.

Try to replace line 1705 of pix2.c, which is pixCreateTemplateNoInit(), with
pixCreateTemplate()
If this solves the problem, then it may be originating in pixUnsharpMaskingGray1D() or in pixUnsharpMaskingGray2D()

DanBloomberg · 2017-06-04T23:09:16Z

and to make sure that valgrind was catching it, I used a conditional in the inner loop:
pix3 = pixScaleGeneral(pix1, 0.8, 0.8, 0.4, 2);
pixGetDimensions(pix3, &w, &h, NULL);
for (i = 0; i < h; i++) {
for (j = 0; j < w; j++) {
pixGetPixel(pix3, j, i, &val);
if (val == 0x12345678)
break;
}
}

If any pixel were uninitialized, we'd have a message like:
Conditional jump or move depends on uninitialised value(s)
==65877== at 0x40481C: main (...)
There were none.
(We did have an issue with normalization in these pixUnsharpMaskingGray*() functions back in April, that gave rise to some artifacts.)

stweil · 2017-06-05T06:49:55Z

My short test program pixtest.cpp was successful:

#include <leptonica/allheaders.h>

int main(void)
{
  PIX *pix1 = pixCreate(2047, 71, 8);
  PIX *pix2 = pixScaleGeneral(pix1, 0.676056325, 0.676056325, 0.200000003, 1);
  int width = pixGetWidth(pix2);
  int height = pixGetHeight(pix2);
  int y = height / 2;
  l_uint32* line = pixGetData(pix2) + pixGetWpl(pix2) * y;
  int prev = GET_DATA_BYTE(line, 0);
  if (prev != 0) {
    return 1;
  }
  return 0;
}

I compiled it on Debian Stretch using g++ -g pixtest.cpp -llept.
Valgrind reports the same problem as in Tesseract:

==22626== Conditional jump or move depends on uninitialised value(s)
==22626==    at 0x400832: main (pixtest.cpp:12)

stweil · 2017-06-05T07:00:09Z

We did have an issue with normalization in these pixUnsharpMaskingGray*() functions back in April, that gave rise to some artifacts.

Could that be the reason for the problem? Debian has a Leptonica version 1.74.1-1, so maybe that fix is missing? Then Debian needs a newer version (and Tesseract should require Leptonica 1.74.2). CC @jbreiden.

DanBloomberg · 2017-06-05T20:13:25Z

Stefan, can you run your test program from our github head, or replacing the 1.74.1 scale.c with the most recent one? That would determine if the normalization change fixed this problem.

I ran your exact program with the current pixScaleGeneral(), on valgrind, and got no error.

Shreeshrii · 2017-06-07T06:57:09Z

@theraysmith

Under what conditions should eval be run from trainer?

Will training work if it is done without eval?

DanBloomberg · 2017-06-08T00:03:22Z

After failing to find the problem with an uninitialized value from pixUnsharpMasking(), I will do the most simple thing, which is make sure that the pixel values are initialized. Use of pixCreateTemplateNoInit(), instead of pixCreateTemplate(), is clearly a poor optimization. I will also remove other uses of the NoInit version in places where it's not obvious by inspection that all pixels are set.

DanBloomberg · 2017-06-08T20:23:26Z

Committed (#512) to leptonica. Hoping that this solves any uninitialized value problems.

Shreeshrii · 2017-06-10T05:08:22Z

https://github.com/DanBloomberg/leptonica/releases/tag/1.74.3

Should we be using this for 4.00?

stweil · 2017-06-10T06:09:54Z

As far as I see Leptonica 1.74.2 is sufficient (it solved the uninitialized value problems for Tesseract), but of course you can use the newer version, too.

DanBloomberg · 2017-06-10T18:27:47Z

1.74.3 has a fix for the uninitialized issue (the topic of this thread), and some other bug fixes on windows. It has far fewer coverity scan 'bugs'. And I also made it configure-ready, as advertised in the README.

…

On Fri, Jun 9, 2017 at 11:10 PM, Stefan Weil ***@***.***> wrote: As far as I see Leptonica 1.74.2 is sufficient, but of course you can use the newer version, too. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#644 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AP6mLLmxMT7FYa8MVlC7A9Z274t3Ng-dks5sCjM_gaJpZM4Ld5Xp> .

stweil · 2017-06-10T21:14:40Z

Dan, the uninitialized issue was already fixed with 1.74.2, and so were the Coverity scan issues.

DanBloomberg · 2017-06-10T21:23:22Z

Somehow I didn't realize that 1.74.2 fixed the uninitialized issue in unsharp masking. So now it's double-fixed. And 1.74.3 has even more fixed coverity scan issues :-) For 1.74.3, I wanted to make a configure-ready version available (and also, I hadn't done a tarball release before on github). I plan to make all future releases that way. So it seems that both 1.74.2 and 1.74.3 can be used with tesseract. For the future, I'd like to remove the pixWriteDisplay*() functions from the library, which are only there to support some older versions of tesseract.

…

On Sat, Jun 10, 2017 at 2:14 PM, Stefan Weil ***@***.***> wrote: Dan, the uninitialized issue was already fixed with 1.74.2, and so were the Coverity scan issues. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#644 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AP6mLANpA6zDdbA955Xk-agzECB6WpQlks5sCwdOgaJpZM4Ld5Xp> .

dudullz · 2017-06-12T09:40:22Z

Hi there!
The problem seems still there, at least on my MBP :(

I had exactly the same issue after exactly 100 iterations:
Assertion failed: (index >= 0 && index < size_used_), function operator[], file ~/tesseract/ccutil/genericvector.h, line 713.

I was using Leptonica 1.74.1 provided by Macports, then built manually 1.74.4 (just 1 hour after Dan put out it :) ), still the same problem.
I make sure lstmtraining is linking to the correct version of Lept library (using otool -L on Mac)

What baffles me is, the scale.c file - which I presume includes the fix to the problem - is identical for 1.74.[1 | 2 | 3 | 4]... is it correct?

I'm using a 2015 MBP running OS X El Capitan (10.11.6) with Xcode 8.
The command I'm using to train:

./tesseract/build-mac/bin/Debug/lstmtraining -U tesstutorial/engtrain/eng.unicharset --script_dir langdata/ --debug_interval 100 --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]' --model_output tesstutorial/engoutput/base --train_listfile tesstutorial/engtrain/eng.training_files.txt --eval_listfile tesstutorial/engeval/eng.training_files.txt --max_iterations 5000

Thanks!

stweil · 2017-06-12T09:46:07Z

@dudullz, there were several problems, and updating Leptonica to 1.74.2 or newer fixes one of them. Pull request #978 fixes the assertion. It's still unmerged, that's why you got that assertion.

dudullz · 2017-06-12T12:46:35Z

@stweil first thanks for TRULY swift reply!

It has been training for past 2 hours for about 3,000 iterations and still going - it's just enjoyable to see it continuing after days of struggling :)

So has updating Leptonica to 1.74.4 + pull request #978 fixed the real problem? means the trained results are valid to use now (not like non-debug builds continue but with invalid results)?

stweil · 2017-06-12T18:44:35Z

That fixed at least the currently known problems and should produce more stable results as the trained data no longer depends on undefined values (which could produce random results).

Shreeshrii · 2017-06-14T03:40:33Z

@stweil

When training with --eval_listfile in addition to --train_listfile, I have noticed that while all files from --train_listfile are loaded at beginning of training process, only first two from --eval_listfile are loaded. However, I have not found any images from the --eval_listfile being used in the training (I am going by the detailed log per iteration which is displayed with --debug_interval -1.

Does this mean that eval is still not being run?

Or, does the eval process not write the log message?

If it is a question of not logging, is it possible to modify to add the log message during evaluation?

Shreeshrii · 2017-06-15T04:48:32Z

does the eval process not write the log message?

Answered in https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

With --debug_interval -1, the trainer outputs verbose text debug for every training iteration.

@stweil Eval seems to be running now (though there are no debug messages to verify it). Thanks for the fix.

Should I close this issue or does it need any other verification?

xlight · 2017-06-17T01:02:45Z

so, is this issue fixed ?

Shreeshrii · 2017-06-17T03:47:56Z

@xlight Thanks to patches by @stweil and leptonica 1.74.2, the assertion and problem with uninitialized data related to this issue are fixed when using the latest code from github..

@theraysmith will have to verify that his original problem has been fixed, since I do not know what exact test he was referring to in his comment ...

it doesn't run the eval from the trainer.

Shreeshrii · 2017-06-22T09:48:54Z

2 Percent improvement time=12184, best error was 8.073 @ 23307

Warning: LSTMTrainer deserialized an LSTMRecognizer!

At iteration 35491/53200/53202, Mean rms=0.165%, delta=1.673%, char train=6.063%, word train=19.746%, skip ratio=0%, New best char error = 6.063

At iteration 33032, stage 1, Eval Char error rate=19.697385, Word error rate=42.197884

wrote checkpoint.

At iteration 35535/53300/53302, Mean rms=0.166%, delta=1.706%, char train=6.108%, word train=19.751%, skip ratio=0%, New worst char error = 6.108

At iteration 34063, stage 1, Eval Char error rate=20.580924, Word error rate=41.515177

wrote checkpoint.

Eval is being run from trainer. Closing the issue. Thanks, @stweil.

Shreeshrii · 2017-07-15T06:54:38Z

@theraysmith

Latest code change has reverted the fix.

Iteration 99: ALIGNED TRUTH : च छ ज झ ञ उ ऊ ट ठ ड ढ ण ऋ ॠ त थ
Iteration 99: BEST OCR TEXT :
File /tmp/tmp.m82dWGBYZW/mar/mar.Aparajita.exp0.lstmf page 3 :
Mean rms=5.427%, delta=53.011%, train=110.2%(100%), skip ratio=0%
lstmtraining: genericvector.h:713: T& GenericVector<T>::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

theraysmith · 2017-07-16T00:25:06Z

Yes a new bug. I was mystified for a while, but it is very simple.
Fixed in 4b6f0b9..f4f66f8

Shreeshrii · 2017-07-17T08:44:24Z

Fixed by 45fb7dd

Shreeshrii mentioned this issue Jan 11, 2017

LSTM: Training - Arabic - Add Top layer - Aborted (core dumped) #642

Closed

stweil mentioned this issue May 30, 2017

--disable-openmp improves performance? #961

Closed

This was referenced Jun 2, 2017

LSTM:Training - Error - ccutil/genericvector.h:696: core dumped #561

Closed

LSTM: Training: Deserialize Failed #792

Closed

stweil mentioned this issue Jun 4, 2017

LSTMTrainer: Catch empty vectors #978

Merged

stweil mentioned this issue Jun 5, 2017

Update from Leptonica 1.74.1 to 1.74.2 #979

Merged

Shreeshrii closed this as completed Jun 22, 2017

Shreeshrii reopened this Jul 15, 2017

theraysmith pushed a commit that referenced this issue Jul 16, 2017

Fixed regression of issue #644

f4f66f8

theraysmith pushed a commit that referenced this issue Jul 16, 2017

Fixed regression of issue #644 again!

45fb7dd

Shreeshrii closed this as completed Jul 17, 2017

amitdo added bug training labels May 22, 2020

amitdo added the leptonica label Mar 21, 2021

LSTM: Training - Eval not run from trainer #644

LSTM: Training - Eval not run from trainer #644

Comments

Shreeshrii commented Jan 9, 2017

xlight commented Apr 11, 2017

Shreeshrii commented May 20, 2017

Shreeshrii commented May 29, 2017

stweil commented May 29, 2017

Shreeshrii commented May 30, 2017

Shreeshrii commented May 30, 2017

stweil commented May 30, 2017 • edited Loading

Shreeshrii commented May 30, 2017 • edited Loading

Shreeshrii commented May 30, 2017

amitdo commented May 30, 2017

Shreeshrii commented May 31, 2017

Shreeshrii commented May 31, 2017

stweil commented May 31, 2017

stweil commented May 31, 2017 • edited Loading

Shreeshrii commented May 31, 2017

Shreeshrii commented May 31, 2017

stweil commented Jun 4, 2017

DanBloomberg commented Jun 4, 2017 • edited Loading

DanBloomberg commented Jun 4, 2017

stweil commented Jun 5, 2017 • edited Loading

stweil commented Jun 5, 2017

DanBloomberg commented Jun 5, 2017

Shreeshrii commented Jun 7, 2017

DanBloomberg commented Jun 8, 2017

DanBloomberg commented Jun 8, 2017

Shreeshrii commented Jun 10, 2017

stweil commented Jun 10, 2017 • edited Loading

DanBloomberg commented Jun 10, 2017 via email

stweil commented Jun 10, 2017

DanBloomberg commented Jun 10, 2017 via email

dudullz commented Jun 12, 2017 • edited Loading

stweil commented Jun 12, 2017

dudullz commented Jun 12, 2017

stweil commented Jun 12, 2017

Shreeshrii commented Jun 14, 2017

Shreeshrii commented Jun 15, 2017

xlight commented Jun 17, 2017

Shreeshrii commented Jun 17, 2017 • edited Loading

Shreeshrii commented Jun 22, 2017

Shreeshrii commented Jul 15, 2017

theraysmith commented Jul 16, 2017

Shreeshrii commented Jul 17, 2017

stweil commented May 30, 2017 •

edited

Loading

Shreeshrii commented May 30, 2017 •

edited

Loading

stweil commented May 31, 2017 •

edited

Loading

DanBloomberg commented Jun 4, 2017 •

edited

Loading

stweil commented Jun 5, 2017 •

edited

Loading

stweil commented Jun 10, 2017 •

edited

Loading

dudullz commented Jun 12, 2017 •

edited

Loading

Shreeshrii commented Jun 17, 2017 •

edited

Loading