cyrillic turned into chinese

Hi

I have 2 files in cyrillic. I can read both without issue in MS Word.
The first seems to work fine with:

```
with open(fullpath) as infile:
                content = infile.read()
                text = rtf_to_text(content ,'ignore')
```

The second (bad.zip) gets turned into chinese characters

[good.zip](https://github.com/joshy/striprtf/files/7814670/good.zip)
[bad.zip](https://github.com/joshy/striprtf/files/7814671/bad.zip)

sample output from the good one:
```
>>> tabtext =text.split("|||")
>>> print(tabtext[0])
Таблиця розподілу номерного ресурсу
Кіровоградська область|
Код зони - 52
```

sample output from the bad one:
```
>>> tabtext =text.split("|")
>>> print(tabtext[0])
亦犭桷 痤顼钿畴 眍戾痦钽 疱耋瘃
它獬怦赅 钺豚耱鼃
暑 珙龛 - 32
```

if i leave out the "ignore", i get:
**UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 6: illegal multibyte sequence**

any idea how i can work around this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cyrillic turned into chinese #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development