Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes in to_tensorflow method #44

Closed
AbhinavTuli opened this issue Sep 21, 2020 · 6 comments · Fixed by #90 or #140
Closed

Fixes in to_tensorflow method #44

AbhinavTuli opened this issue Sep 21, 2020 · 6 comments · Fixed by #90 or #140
Assignees
Labels

Comments

@AbhinavTuli
Copy link
Contributor

AbhinavTuli commented Sep 21, 2020

Observed a couple of problems while converting stored datasets to TensorFlow format that need some small fixes.

to_tensorflow fails when the meta information for a tensor includes dtype="object" ("object" dtype has been used for images, area, id, bbox in Coco dataset - https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py#L24)
A fix for this is to keep the dtype="uint8" or something similar while uploading. The Coco example needs to be updated to reflect this.

to_tensorflow also fails when it gets shape=(1,) in meta and the actual object has multiple dimensions, for example, an image.
This can be fixed by commenting out this line https://github.com/activeloopai/Hub/blob/master/hub/collections/dataset/core.py#L633, which will set the output_shapes as None by default.

to_pytorch works fine in both the above cases.

@AbhinavTuli AbhinavTuli changed the title Problems with to_tensorflow Fixes in to_tensorflow method Oct 9, 2020
@ADI10HERO
Copy link
Contributor

to_tensorflow fails when the meta information for a tensor includes dtype="object" ("object" dtype has been used for images, area, id, bbox in Coco dataset - https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py#L24)
A fix for this is to keep the dtype="uint8" or something similar while uploading. The Coco example needs to be updated to reflect this.

Changing the values in the dict returned by meta would work?

The Coco example needs to be updated to reflect this.

And I did not get this part...

to_tensorflow also fails when it gets shape=(1,) in meta and the actual object has multiple dimensions, for example, an image.
This can be fixed by commenting out this line https://github.com/activeloopai/Hub/blob/master/hub/collections/dataset/core.py#L633, which will set the output_shapes as None by default.

After I comment out the line, how do I test it?

@AbhinavTuli
Copy link
Contributor Author

Changing the values in the dict returned by meta would work?

Yeah should work by changing the dtype in meta, but it's better to change it in call as well, to make it obvious for any user

And I did not get this part...

When the user doesn't specify the shape and just mentions shape = (1,), to_tensorflow strictly expects shape as (1,). If we comment out https://github.com/activeloopai/Hub/blob/master/hub/collections/dataset/core.py#L633, it will accept any shape given to it.

After I comment out the line, how do I test it?

Once you make both of these changes, try storing the dataset using https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py.
(Tip:- if you use ds.store("./path/to/directory"), the dataset will get stored locally, instead of online, might save you some time)
Once you have stored it, try loading it with a file similar to:- https://github.com/activeloopai/Hub/blob/master/examples/load_tf.py
If everything is fine, you shouldn't face any issues in loading.

Also try loading with Pytorch, to ensure that the changes didn't break that:-
https://github.com/activeloopai/Hub/blob/master/examples/load_pytorch.py

Let me know if you face any issues.

@ADI10HERO
Copy link
Contributor

And I did not get this part...

Actually @AbhinavTuli by that I meant I did not get this (below) part

The Coco example needs to be updated to reflect this

So by example, do you mean the example py file (https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py) or example dataset? 😅

@AbhinavTuli
Copy link
Contributor Author

Essentially both. You'll update the py file and then upload the dataset using the modified file, so both will get updated.

@ADI10HERO
Copy link
Contributor

Makes sense, got it.

While making changes in the meta file, you mentioned "unit8" or something similar...
The changes I currently made (on my local system are)
area --> uint32
bbox --> uint16
id --> uint32
image --> uint32

The reason, why I chose these values, is because, after a quick google search on coco, I got a feeling that uint8 won't be sufficient.
But then, I am not sure what these values mean, so here I have 2 questions

  1. Are my dtype values correct/okay, if not what should they be?
  2. What is their significance?

@AbhinavTuli
Copy link
Contributor Author

The dtype is essentially similar to numpy dtype that we keep track of as metadata, it helps us in storing chunks of data efficiently (in case chunk_size isn't explicitly mentioned). It also helps us in converting from hub format to other formats as well.

The dtypes you chose should be fine as long as the entire range of values fits in them, which I think they will.
Try saving the dataset locally to see if everything is working. It'll save you some time. Use ds.store("./path") for this.
Once everything is working you can upload to hub.

AbhinavTuli added a commit that referenced this issue Oct 30, 2020
fixing to_tensorflow method to take any shape #44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants