Closed
Description
❓ Questions and Help
This is probably not an issue with pytorch. I try to torch.save
an ops with bytes from pybind11 pickle. Here is the pybind11 registration link.
How do I torch.save
the ops:
import torchtext
import torch
from torchtext.experimental.transforms import PRETRAINED_SP_MODEL, load_sp_model
sp_model_path = torchtext.utils.download_from_url(PRETRAINED_SP_MODEL['text_unigram_25000'])
sp_model = load_sp_model(sp_model_path)
f = open("temp.pt", "wb")
torch.save(sp_model, f)
The error message I got:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-c66cdf260b7d> in <module>
1 f = open("temp.pt", "wb")
----> 2 torch.save(sp_model, f)
~/tmp/PyTorch/pytorch/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
370 if _use_new_zipfile_serialization:
371 with _open_zipfile_writer(opened_file) as opened_zipfile:
--> 372 _save(obj, opened_zipfile, pickle_module, pickle_protocol)
373 return
374 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
~/tmp/PyTorch/pytorch/torch/serialization.py in _save(obj, zip_file, pickle_module, pickle_protocol)
474 pickler = pickle_module.Pickler(data_buf, protocol=pickle_protocol)
475 pickler.persistent_id = persistent_id
--> 476 pickler.dump(obj)
477 data_value = data_buf.getvalue()
478 zip_file.write_record('data.pkl', data_value, len(data_value))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 119: invalid start byte
cc @mruberry
Metadata
Assignees
Labels
No labels
Activity
mthrok commentedon Dec 4, 2020
This is about the usage of PyBind11. See https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#return-c-strings-without-conversion
Changing the binding to this works
https://github.com/mthrok/text/blob/b0f88ba56603590444a5c10e216573ebce0740ad/torchtext/csrc/register_bindings.cpp#L52-L74
zhangguanheng66 commentedon Dec 7, 2020
Thanks @mthrok, how about the torchbind one? I saw a similar issue when saving and loading the torchbind SP model. As you found the solution, do you want to send out a PR and land the pybind pickle support for SP model?
mthrok commentedon Dec 8, 2020
@zhangguanheng66
Is there an issue reported that I can look at?
zhangguanheng66 commentedon Dec 8, 2020
Yup. Here is the code snippet to reproduce the serialization issue with torchbind sentencepiece model
Same issue is observed for the pybind one.
mthrok commentedon Dec 8, 2020
Okay so I believe this is not pytorch fire issue, so I am transferring the issue to torchtext.
mthrok commentedon Dec 23, 2020
Addressed in #1104