Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce POSIX data types #177

Merged
merged 1 commit into from
Nov 24, 2016
Merged

Introduce POSIX data types #177

merged 1 commit into from
Nov 24, 2016

Conversation

stweil
Copy link
Member

@stweil stweil commented Dec 23, 2015

POSIX provides portable data types for signed and unsigned integer values
of different size.

This patch maps those POSIX data types to the Tesseract specific types.
In a next step, the Tesseract data types can be eliminated by replacing
them with the POSIX data types.

Use also standard definitions for the printf format specifiers.
MS Visual Studio does not support that standard (at least not in older
versions), so local definitions are needed there.

NULL is standard, so a local definition should not be needed.

Signed-off-by: Stefan Weil sw@weilnetz.de

@stweil
Copy link
Member Author

stweil commented Dec 25, 2015

Updated patch: removed part for format specifiers, don't remove NULL definition (both issues should be done in separate patches).

@LinusU
Copy link

LinusU commented Jan 22, 2016

Maybe it would be even better to go thru the code and use, for example, int8_t instead of inT8...

@stweil
Copy link
Member Author

stweil commented Jan 22, 2016

@LinusU, yes, that would be the next step as soon as this pull request was accepted.

@stweil stweil changed the title Introduce POSIX data types and definitions Introduce POSIX data types Jan 22, 2016
@LinusU
Copy link

LinusU commented Jan 22, 2016

👍

@stweil
Copy link
Member Author

stweil commented Feb 18, 2016

Ping? Are there any more thoughts on my proposal?

@zdenop
Copy link
Contributor

zdenop commented Feb 18, 2016

...waiting for review @theraysmith

@stweil
Copy link
Member Author

stweil commented Mar 16, 2016

@theraysmith, ping. Do you support the idea of replacing Tesseract data types by POSIX data types (so I can prepare a follow-up pull request)?

Many other free software projects with similar compiler / host conditions have shown that using POSIX data types works. Especially for library interfaces, but also for the rest of the code, it would be good to get rid of project specific data types which don't provide any additional value compared with the standard.

@stweil
Copy link
Member Author

stweil commented Apr 30, 2016

Ping? I suggest to apply this patch now, wait one more month and then replace all Tesseract integer types by the POSIX types.

@stweil
Copy link
Member Author

stweil commented May 15, 2016

@theraysmith, @zdenop, do you have any comments to my last proposal? Can we proceed like that?

@stweil
Copy link
Member Author

stweil commented Jul 8, 2016

Ping?

@stweil
Copy link
Member Author

stweil commented Aug 5, 2016

I still think it would be a good idea to replace all proprietary data types by the POSIX ones. Is it really necessary to wait for @theraysmith (who is obviously very busy)? Lots of other free (and also commercial) software projects work pretty good with the POSIX data types, using similar environments as Tesseract (Linux and other Unixes, Windows with Cygwin / MinGW-w64 / MS and other compilers).

So can we do the first step and apply this PR which is waiting for more than 7 months now?

@stweil
Copy link
Member Author

stweil commented Sep 8, 2016

Another month passed. I'd still like to see Tesseract switching to POSIX data types.

@Shreeshrii
Copy link
Collaborator

@zdenop said re: next release in #165 (comment)

I hear something like end of September 2016, but you never know ;-) It will big update (probably we will drop support of compilers nor support c++11)...

Will dropping support for compilers effect Tesseract switching to POSIX data types?

@stweil
Copy link
Member Author

stweil commented Oct 24, 2016

Will dropping support for compilers effect Tesseract switching to POSIX data types?

I don't think that POSIX data types are affected by the compiler decision. They exist for many years now, so any supported old or new compiler will work with POSIX data types.

POSIX provides portable data types for signed and unsigned integer values
of different size.

This patch maps those POSIX data types to the Tesseract specific types.
In a next step, the Tesseract data types can be eliminated by replacing
them with the POSIX data types.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
@stweil
Copy link
Member Author

stweil commented Nov 22, 2016

@theraysmith, may I kindly ask you to give your consent?

@theraysmith
Copy link
Contributor

Now we're synced, yes I think that would be a positive change, but it may be something to keep for 4.00 forward, so as to maintain maximum support for old compilers in the last 3.xx version.
(I already put a few experimental uses of nullptr in 4.00 to see if anyone squeals.)
Not being a git expert yet, I assume I can just hit the pull button and 3.xx can still be unaffected right?

@zdenop
Copy link
Contributor

zdenop commented Nov 22, 2016

Yes, just hit button "Merge pull request" if you wan to include it to currect (4.00) code

@zdenop zdenop merged commit 64159c7 into tesseract-ocr:master Nov 24, 2016
@LinusU
Copy link

LinusU commented Nov 24, 2016

Woohoo! 🎉

Now if we could only replace all occurences of inT8 with int8_t, uinT8 with uint8_t and so forth 🙌

@stweil stweil deleted the posix branch November 24, 2016 16:53
@stweil
Copy link
Member Author

stweil commented Nov 24, 2016

Yes, replacing those Tesseract data types by the POSIX data types will be the next step.

@zdenop, that means changes for nearly all source files which will give conflicts with pending pull requests. Should I nevertheless send one large PR which does the replacement, or would it be better to do it in smaller steps (starting for example with all files in lstm)? Are there more major changes pending after the integration of LSTM?

@zdenop
Copy link
Contributor

zdenop commented Nov 24, 2016

@theraysmith : Is there plan to commit new code that would interfere with this changes?
@stweil : I would wait for the moment ;-) so there is change to fix building of 4.0 (outside of linux). E.g. 1-2 weeks after lstm data files are available.

@stweil
Copy link
Member Author

stweil commented Nov 24, 2016

I already put a few experimental uses of nullptr in 4.00 to see if anyone squeals.

@theraysmith, training/stringrenderer.cpp already uses nullptr for more than two years now. AFAIK nobody complained, so that seems to work. Replacing NULL by nullptr would be good, but also touches many files, so this could be done in the same action as the switch to POSIX data types.

Do you care for comments after modified code? Replacing data types or NULL is easy, but the replacements are a little bit longer, and moving the comments to the right column means much hand work without code formatter.

@theraysmith
Copy link
Contributor

theraysmith commented Nov 26, 2016 via email

@theraysmith
Copy link
Contributor

FYI: This change breaks several tests in the Google repository because the Google int64 is long long and the posix int64_t is just long, and the compiler says they are not compatible.
Of course they shouldn't have used the wrong type to begin with.
I'm hoping that they can be fixed easily enough and it doesn't open a whole can of worms, but I thought you would be interested that it is not as easy as you thought.

@stweil
Copy link
Member Author

stweil commented Dec 1, 2016

Yes, the change to POSIX requires some work (not only in your tests), but it will help in the long run – for example as soon as there is a 128 bit architecture with long being int128_t. And it helps a lot as soon as you want to use Tesseract code in other software.

@egorpugin
Copy link
Contributor

There's also int_least64_t. http://en.cppreference.com/w/cpp/types/integer

stweil added a commit to stweil/tesseract that referenced this pull request Dec 8, 2018
This fixes a warning from the Intel compiler:

    src/textord/cjkpitch.cpp(79): warning tesseract-ocr#177:
      function "<unnamed>::SimpleStats::maximum" was declared
      but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
stweil added a commit to stweil/tesseract that referenced this pull request Dec 8, 2018
This fixes warnings from the Intel compiler:

    src/textord/cjkpitch.cpp(319): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::good_gaps" was declared but never referenced
    src/textord/cjkpitch.cpp(383): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::is_bad" was declared but never referenced
    src/textord/cjkpitch.cpp(387): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::is_unknown" was declared but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
noahmetzger pushed a commit to noahmetzger/tesseract that referenced this pull request Feb 5, 2019
This fixes a warning from the Intel compiler:

    src/textord/cjkpitch.cpp(79): warning tesseract-ocr#177:
      function "<unnamed>::SimpleStats::maximum" was declared
      but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
noahmetzger pushed a commit to noahmetzger/tesseract that referenced this pull request Feb 5, 2019
This fixes warnings from the Intel compiler:

    src/textord/cjkpitch.cpp(319): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::good_gaps" was declared but never referenced
    src/textord/cjkpitch.cpp(383): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::is_bad" was declared but never referenced
    src/textord/cjkpitch.cpp(387): warning tesseract-ocr#177:
      function "<unnamed>::FPRow::is_unknown" was declared but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants