Closed
Description
Hello,
I have a repository whose refs/commit messages are in cyrillic:
>>> import os
>>> list(os.walk(b'refs/heads'))
[(b'refs/heads', [], [b'\xcd\xee\xe2\xe0\xff\xe2\xe5\xf2\xea\xe01', b'master'])]
>>> s = b'\xcd\xee\xe2\xe0\xff\xe2\xe5\xf2\xea\xe01'
>>> s.decode('latin1')
'Íîâàÿâåòêà1' # seems like rubbish
>>> s.decode('cp1251')
'Новаяветка1' # looks like russian -> google translates agrees: `newlight1`
google-translates 'Новаяветка1' as newlight1
.
... and somehow, that makes dulwich break:
$ python3
>>> from dulwich.repo import Repo
>>> r = Repo('.')
>>> r.refs
DiskRefsContainer('.')
>>> r.refs.allkeys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 470, in allkeys
sys.getfilesystemencoding())
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 11-20: surrogates not allowed
I think it's not the expected behavior according to my understanding of the doc.
Do you know how could i overcome this?
Thanks for your help.
Cheers,