-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and enable lstm related unittests #2180
Conversation
@stweil Please see the attached log file. The error rates for many tests are much lower than the expected values. I am wondering if it is related to using the Batch/Mean error as the Best error. Is this the same way error rates are calculated in tesseract? Of course the difference could just be because the 'testdata' is different. |
Fixed a merge conflict. Signed-off-by: Stefan Weil <sw@weilnetz.de>
I added a commit to fix a merge conflict with Git master. |
unittest/log.h
Outdated
break; | ||
case ERROR: | ||
std::cout << "[ERROR] "; | ||
std::cout << "\n[ERROR] "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shreeshrii, did you find the implementation which is used by Google, and does that implementation add line feeds like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google might be using the implementation in glog - https://github.com/google/glog/blob/master/src/windows/glog/logging.h
I added the linefeed because I thought it might increase readability. It could probably be replaced by a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The Google implementation adds a linefeed at the end if the log string does not already end with one. As Tesseract only has a few users of LOG
, I think the linefeed characters can be added locally when calling LOG
if needed. I suggest to remove the 3rd commit, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stweil I don't know how to remove the commit. I have added another commit reverting the change.
@@ -17,7 +17,7 @@ | |||
#include "fileio.h" // for tesseract::File | |||
#include "gtest/gtest.h" | |||
|
|||
const char* FLAGS_test_tmpdir = "."; | |||
const char* FLAGS_test_tmpdir = "./tmp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That directory is missing for builds which are not started in the root directory, so a lot of tests fail or crash currently. Do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was running make check
in tesseract root directory all files generated by unittests were being created in the unittest root directory. make clean
did not remove them. So I thought it would be helpful to have a separate directory for the generated files.
There maybe a better way to accomplish this. Please change as you see fit. Thanks.
When built with --enable-openmp
With --disable-openmp
|
@stweil Ref: #2180 (comment) Did you have a chance to look into this? I reran tesstutorial today. According to Ray in https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-from-scratch
In my test run today
|
I will upload required testdata files to the test repo.