-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for text angle/gradient to be retrieved #4070
Conversation
So paragraph detection will run twice if It can be solved by using this condition: Lines 903 to 907 in 424b17f
also in Then we won't need to expose What do you think? |
Why not use leptonica solution? #include <leptonica/allheaders.h>
int main() {
PIX *pix2;
l_float32 angle, conf;
Pix *image = pixRead("rotate_image.png");
pix2 = pixFindSkewAndDeskew(image, 2, &angle, &conf);
printf("Skew angle: %7.2f degrees; %6.2f conf\n", angle, conf);
pixWrite("fixed_rotate_image.png", pix2, IFF_PNG);
pixDestroy(&image);
pixDestroy(&pix2);
return 0;
} |
Because it is not necessary. Tesseract does it anyway. |
My understanding is that users want to fix rotation before running OCR. |
First, using the text gradient number that Tesseract already calculates does not add any extra steps or runtime for images that are not flagged as having problematic text angles. While I'm sure that adding additional pre-processing steps is a viable solution in many contexts (e.g. processing scanned documents), when building applications where speed is a very high priority, sending all input images through an extra step is sub-optimal. My use case here is maintaining Tesseract.js, which is primarily used in web applications rather than document processing. Second, Leptonica uses a different methodology from Tesseract for calculating the angle of the page, so using Leptonica's algoirthm adds another point of failure. While both Tesseract and Leptonica sometimes calculate the angle incorrectly, if Tesseract calculates the angle incorrectly the OCR results were almost certainly going to be bad anyway (as this calculation occurs during the line detection step). Using the angle Tesseract calculates is inherently low-risk in that regard. On the other hand, as Leptonica uses a different algorithm, it can calculate text gradient incorrectly in a way that harms images that would otherwise produce high-quality results. When testing both solutions with sample documents, I found the implementation using the angle calculated by Tesseract to produce better results. Overall, while an individual user may decide that using a separate auto-rotate script is better for their workflow, I think that the angle calculated by Tesseract is useful information and do not believe there's any reason it should not be accessible to the user. |
I think this makes sense conceptually, however if I understand correctly, such a change would impact the results returned by the |
You are right that my suggestion changes the current behavior, so it's not a good idea. I still think we should not expose My new suggestion is to choose one of these option:
Apart from this small issue, I like the new feature. |
@amitdo I changed |
After reading @Balearica's answer to your question, do you object to merging this PR? @stweil, can we merge it? |
First of all: changing/extending C++-API should be reflected in C-API too. Next: playing with public API has an impact on symver, which has an impact on including new versions in major Linux distributions. This should be carefully planned Personally (e.g. next is not a showstopper), I prefer that image-related operations are handled by Leptonica. Maybe I miss something so maybe information on how the gradient is planned to use would help me to make it clear ;-) (e,.g. to measure speed/performance). |
Good point, I edited the C API to reflect this change. |
Regarding semver, the current revision of this PR only adds one method to the public API, so the next version should be 5.4.0. |
Anyone know when this will get merged? Almost a year now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Tesseract already calculates the average gradient (angle) of text lines within
Textord::TextordPage
at present. The gradient of the text is useful information (as Tesseract performs poorly when the gradient is not [almost] zero), however the average gradient only exists within theTextord::TextordPage
function at present, with no way for users to access it. Using the API, getting the gradient currently requires runningRecognize
orAnalyseLayout
and using the results to manually re-calculate the gradient.This PR allows for users to directly retrieve the existing average gradient value calculated in
Textord::TextordPage
using a function namedGetGradient
. This function can be called any time afterFindLines
has been run.I also madeFindLines
public so it can be run directly without runningRecognize
orAnalyseLayout
first (runningAnalyseLayout
would result in paragraph recognition being run twice).I've already used this branch to implement an auto-rotate feature in the latest version of Tesseract.js that (unlike adding an auto-rotate pre-processing step) does not negatively impact performance for images without problematic rotation. A basic script using
GetGradient
is below for demonstrative purposes, along with a test image (namedrotate_image.png
in the code). Resolves #3836.