Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary data incorrectly detected as text/numeric in "Browse Data" #2197

Closed
mbrijun opened this issue Apr 20, 2020 · 7 comments
Closed

Binary data incorrectly detected as text/numeric in "Browse Data" #2197

mbrijun opened this issue Apr 20, 2020 · 7 comments
Assignees
Labels
bug Confirmed bugs or reports that are very likely to be bugs.
Milestone

Comments

@mbrijun
Copy link

mbrijun commented Apr 20, 2020

Details for the issue

What did you do?

I have a SQLite table database with 76944 rows. The table has a BLOB column. All entries, except for the two entries below, are correctly identified as "binary" by the "Browse Data" tab.

0000  ff fe cf 62 24 7c 80 37 49 b3 ed 1e 28 70 1f e8  ...b$|.7I...(p.. 
0010  b9 ad c3 d4 25 75 f9 29 22 37 e6 f8 36 d7 43 6c  ....%u.)"7..6.Cl 
0000  ff fe 64 4c 2a e4 37 4c 50 57 48 33 91 12 dd b4  ..dL*.7LPWH3.... 
0010  95 cd 83 a4 95 91 93 d9 b9 2c 57 ec b8 c9 12 08  .........,W..... 

What did you expect to see?

With "Automatically Adjust the editor mode to the loaded data type" enabled, I expected it to detect "Binary".

What did you see instead?

The autodetection detected these 2 cells as Text/Numeric. I had to disable autodetection and force the mode to "binary" to see the actual data. This may potentially be related to the bug 1772.

image

image

Useful extra information

DB4S v3.11.2 [built for x86_64-little_endian-llp64] on Windows 10 (10.0) (winnt/10.0.18362) [x86_64]
using SQLite Version 3.27.2
and Qt 5.11.3

@justinclift
Copy link
Member

Interesting. Would you be ok to try the 3.12.0 alpha1 build and see if the problem still shows up?

    https://sqlitebrowser.org/blog/first-alpha-release-for-3-12-0/

@mbrijun
Copy link
Author

mbrijun commented Apr 20, 2020

Just checked with 3.12.0 alpha1 (it shows as 3.11.100 in the "About" screen). Unfortunately, both BLOBs still show as text.

@justinclift
Copy link
Member

Thanks heaps for testing it @mbrijun. That means the bug is still in our latest code, and wasn't fixed over the ~year or so of changes in the since the 5.11.3 release. So, we'll need to investigate. @mgrojo Your kind of thing?

@mbrijun With that "3.11.100" version numbering, it's because we need something less than "3.12.0". Otherwise update checks will go wrong when the proper 3.12.0 release is released. So, we're using high numbers (100 and above) for the 3.11.x bit until then. 😉

@justinclift justinclift added the bug Confirmed bugs or reports that are very likely to be bugs. label Apr 20, 2020
@mgrojo mgrojo self-assigned this May 6, 2020
mgrojo added a commit that referenced this issue Jun 21, 2020
The presence of a sequence of bytes resembling a BOM does not guarantee
that the data is text. We can in those cases use the detection provided
by Qt. If the codec matches the one selected, we can consider that text.

See issue #2197
@mgrojo
Copy link
Member

mgrojo commented Jun 21, 2020

@mbrijun It should be working in tomorrow's nightly build. Could you give it a try and confirm it? The presence of a sequence of bytes equal to a UTF BOM did assume that the data was text. Now, and additional check is done.

@MKleusberg, maybe you could review the changes. Do you think it could fail in any way? I don't know how to test all the possible encodings with BOMs.

@chrisjlocke
Copy link
Member

I don't know how to test all the possible encodings with BOMs.

Couldn't we have an override? So for database file X, column Y should be shown as Z. Don't know if/how/why it should be stored in the project file but for these weird edge cases, it would be useful to allow the user to override a guess made by DB4S.

@mgrojo
Copy link
Member

mgrojo commented Jun 22, 2020

Couldn't we have an override? So for database file X, column Y should be shown as Z. Don't know if/how/why it should be stored in the project file but for these weird edge cases, it would be useful to allow the user to override a guess made by DB4S.

A possibility would be to assume that any column with data type set to BLOB, will be binary or image. Nevertheless, it should be a preference, because there are cases where a BLOB is set to store text with encodings different to UTF-8 or UTF-16.

Let's see first if it's true that there are still edge cases.

@mbrijun
Copy link
Author

mbrijun commented Jun 22, 2020

June 22 nightly build seems to resolve my original issue. Thank you!

image

@mgrojo mgrojo closed this as completed Aug 6, 2020
@mgrojo mgrojo added this to the 3.12.1 milestone Aug 6, 2020
mgrojo added a commit that referenced this issue Aug 22, 2020
The presence of a sequence of bytes resembling a BOM does not guarantee
that the data is text. We can in those cases use the detection provided
by Qt. If the codec matches the one selected, we can consider that text.

See issue #2197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs.
Projects
None yet
Development

No branches or pull requests

4 participants