Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when loading spacy #539

Closed
dterg opened this issue Oct 20, 2016 · 9 comments
Closed

UnicodeDecodeError when loading spacy #539

dterg opened this issue Oct 20, 2016 · 9 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@dterg
Copy link

dterg commented Oct 20, 2016

Whether I use:

from spacy.en import English
nlp = English()

or


import spacy
nlp = spacy.load('en')

I get the error:

nlp = spacy.load('en')
return cls(path=path, **overrides)
if 'vocab' not in overrides \
lemmatizer = cls.create_lemmatizer(nlp)
return Lemmatizer.load(nlp.path)
rules = json.load(file_)
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 565: character maps to

Could this be an issue with the encoding since python 2.7 handles encoding differently than 3.x? Although if I recall well, I used spacy on python 2.7 without any issues before.

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Oct 20, 2016
@honnibal
Copy link
Member

Thanks. I must have opened the file incorrectly. I need to add tests for different encoding environment variables to my travis config.

Are you using Python 2.7 or Python 3.5?

@dterg
Copy link
Author

dterg commented Oct 20, 2016

Python 2.7(.12)

@honnibal
Copy link
Member

While I fix the bug, try doing

export LC_ALL=en_US.UTF8

@dterg
Copy link
Author

dterg commented Oct 20, 2016

Sorry forgot to add I'm running on windows

@honnibal
Copy link
Member

Ah. I hope you'll continue reporting problems if you have them :). We're running a bit blind on Windows at the moment.

I've gotten the CI to test the null encoding environment now, and I've turned the test from red to green. I'll push the fix to PyPi.

@dterg
Copy link
Author

dterg commented Oct 20, 2016

Updated through pip (spacy 1.0.3) but seems like the issue is persisting :/

@honnibal
Copy link
Member

The fixed version is 1.0.4 — I think you nipped in just ahead of the upload. Try now.

@dterg
Copy link
Author

dterg commented Oct 20, 2016

Worked like a charm. Thank you for your time to fix this!

@dterg dterg closed this as completed Oct 20, 2016
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants