Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve EXIF handling #4002

Open
DDoS opened this issue Jun 15, 2024 · 7 comments
Open

Improve EXIF handling #4002

DDoS opened this issue Jun 15, 2024 · 7 comments

Comments

@DDoS
Copy link

DDoS commented Jun 15, 2024

Bug report

Describe the bug
EXIF Byte field is always formatted as a display string. For example the EXIF tag "exif-ifd3-GPSAltitudeRef" outputs "Sea level (Sea level, Byte, 1 components, 1 bytes)". For consistency with other types, I would've expected "0 (Sea level, Byte, 1 components, 1 bytes)", where the number comes first, and the display name is in the parentheses. I'm not sure if this intentional, but it makes parsing byte tags a lot more complicated.

To Reproduce
Steps to reproduce the behavior:

  1. Use an image with GPS EXIF data (like an iPhone photo)
  2. Read the field "exif-ifd3-GPSAltitudeRef"

Expected behavior
"0 (Sea level, Byte, 1 components, 1 bytes)"

Actual behavior
"Sea level (Sea level, Byte, 1 components, 1 bytes)"

Environment

  • OS: macOS 12.7.5
  • Vips: 8.15.1
@DDoS DDoS added the bug label Jun 15, 2024
@jcupitt
Copy link
Member

jcupitt commented Jun 17, 2024

Hello @DDoS,

You're right, libvips will print int, rational and srational EXIF tags as numbers, but it uses exif_entry_get_value() for all other types. This string is internationalised, so it'll vary with things like locale, annoyingly.

EXIF support in libvips hasn't really been touched since the 90s, and it was pretty poor then :( And making changes now would probably break some old code somewhere.

We've talked a few times about designing a better system, but no one's done the work yet. I think the most recent suggestion was to add a new metadata item called exif-data-json which would just be a simple dump of the parsed exif as a chunk of JSON. It could include both i18n values and raw values.

@jcupitt
Copy link
Member

jcupitt commented Jun 17, 2024

Let's tag this as an enhancement issue for a new EXIF system.

@jcupitt jcupitt added enhancement and removed bug labels Jun 17, 2024
@jcupitt jcupitt changed the title EXIF Byte field is always formatted as a display string Improve EXIF handling Jun 17, 2024
@jcupitt
Copy link
Member

jcupitt commented Jun 17, 2024

Any other suggestions for a new and better way to handle EXIF are very welcome, of course.

@lovell
Copy link
Member

lovell commented Jun 17, 2024

In the specific case of GPSAltitudeRef, it's an int8u according to exiftool, but libexif says its EXIF_FORMAT_BYTE. We could modify vips_exif_get_int() to detect these, but as you say this could easily break something else (maybe single-character string-like fields stored as byte?).

--- a/libvips/foreign/exif.c
+++ b/libvips/foreign/exif.c
@@ -244,6 +244,9 @@ vips_exif_get_int(ExifData *ed,
                *out = (int) exif_get_long(entry->data + offset, bo);
        else if (entry->format == EXIF_FORMAT_SLONG)
                *out = exif_get_slong(entry->data + offset, bo);
+       else if (entry->format == EXIF_FORMAT_BYTE &&
+               sizeof_component == 1)
+               *out = *(uint8_t *) entry->data + offset;
        else
                return -1;

@jcupitt
Copy link
Member

jcupitt commented Jun 17, 2024

Yes, exif has:

typedef enum {
        EXIF_FORMAT_BYTE       =  1,
        EXIF_FORMAT_ASCII      =  2,
        EXIF_FORMAT_SHORT      =  3,
        EXIF_FORMAT_LONG       =  4,
        EXIF_FORMAT_RATIONAL   =  5,
    EXIF_FORMAT_SBYTE      =  6,
        EXIF_FORMAT_UNDEFINED  =  7,
    EXIF_FORMAT_SSHORT     =  8,
        EXIF_FORMAT_SLONG      =  9,
        EXIF_FORMAT_SRATIONAL  = 10,
    EXIF_FORMAT_FLOAT      = 11,
    EXIF_FORMAT_DOUBLE     = 12
} ExifFormat;

So we could just switch on that, but it'd almost certainly break things :( I think we just need something better.

@DDoS
Copy link
Author

DDoS commented Jun 23, 2024

I think the most recent suggestion was to add a new metadata item called exif-data-json which would just be a simple dump of the parsed exif as a chunk of JSON. It could include both i18n values and raw values.

I like this idea, perhaps the i18n values should be a separate field though? Only downside I see is that it would require some users (mostly C and C++ projects) to take on a JSON library dependency to parse the output. But if the goal is to keep this as a string field, I still think it's a better outcome than using a custom format that has to be parsed manually.

@jcupitt
Copy link
Member

jcupitt commented Jun 23, 2024

Yes, exactly, you could have eg.:

{
  "exif": {
    "ifd0": {
      "Orientation": "1",
      "Orientation-i18n": "Top-left",
      "XResolution": "72000/1000",
      "XResolution-i18n": "72",
...

Or I suppose the i18n values could be in a separate ifd0-i18n tag. It probably doesn't matter much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants