Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tfloat patch 4: bugfixes for AVX2 FAST_FLOAT Extract8+16 implementations #3494

Closed
wants to merge 11 commits into from

Conversation

GerHobbelt
Copy link
Contributor

Extract from #3490 - bugfixing the AVX2 Extract8+16 codes, where there's lines like __m256d scale01234567 = _mm256_loadu_ps(scales), i.e. loading float vectors into double vector types. Extract from #3490.

Note: next pullreq is a reduced version of this: less code duplication for bleeding edge tfloat branch.

stweil and others added 11 commits July 13, 2021 07:18
Up to now Tesseract used double for training and recognition
with "best" models.

This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.

Ideally this should allow faster training.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
…6d scale01234567 = _mm256_loadu_ps(scales)`, i.e. loading float vectors into double vector types. Extract from tesseract-ocr#3490.
@egorpugin
Copy link
Contributor

Hi,

You are doing patches wrong.

GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Jul 13, 2021
…ation: for TFloat to work, we don't need to duplicate the integer work functions as it's only the ExtractResults16[8,16] functions that need different implementations for float vs. double. These are therefor common to both implementations:

```
static void PartialMatrixDotVector64(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static void PartialMatrixDotVector32(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static void PartialMatrixDotVector16(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                     int num_in, TFloat *v) {

static inline void PartialMatrixDotVector8(const int8_t *wi, const TFloat *scales, const int8_t *u,
                                           int num_in, TFloat *v) {

static void matrixDotVector(int dim1, int dim2, const int8_t *wi, const TFloat *scales,
                            const int8_t *u, TFloat *v) {
```
@GerHobbelt
Copy link
Contributor Author

Note: #3495 is this one (#3494) PLUS FAST_FLOAT condition only applied to the ExtractXYZ calls, as the others are good to go with only their prototype adjusted from double --> TFloat. Hence #3495 is only moving code compared to this one, no code change. (I don't know what diff tools you use, but thus this one (#3494) would be easier to diff/review, and then verify that #3495 is only copy/cut/paste work, resulting in a much larger diff)

@GerHobbelt
Copy link
Contributor Author

Hi,

You are doing patches wrong.

Crap. Yep, seen it. 😊

Discard. Will re-issue.

@GerHobbelt GerHobbelt closed this Jul 13, 2021
@GerHobbelt
Copy link
Contributor Author

GerHobbelt commented Jul 13, 2021

🤔 I used the github link and didn't watch carefully that the bugger ref'd against mainline master instead of stweil/tfloat. Checked against my own visual commit graph and these were correct, but definitely wholly wrong to submit against master. ugh.

@GerHobbelt
Copy link
Contributor Author

Re-issued as stweil#4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants