[fix] prepare_dict support english and chinese in one lexicon.txt #1693

keanucui · 2023-02-13T11:59:37Z

tools/fst/prepare_dict.py, it can't split chinese phrase into character when bpemodel given issus. Now It can process english and chinese in one lexicon.txt .

xingchensong · 2023-02-13T16:10:20Z

tools/fst/prepare_dict.py

+                if word.encode('utf8').isalpha():
+                    pieces = sp.EncodeAsPieces(word)
+                else:
+                    pieces = word
                if contain_oov(pieces):


thx， could you add a brief comment on those changes? (i.e., PR link #1693 and Issue link #1653)

em, where do you mean add comments?

# We assume that the lexicon does not contain code-switch, i.e., the word contains both English and Chinese. # see PR https://github.com/wenet-e2e/wenet/pull/1693 # and Issue https://github.com/wenet-e2e/wenet/issues/1653 if word.encode('utf8').isalpha(): pieces = sp.EncodeAsPieces(word) else: pieces = word

robin1001 · 2023-02-15T02:48:21Z

please fix the lint problem.

robin1001 · 2023-02-15T02:49:01Z

./tools/fst/prepare_dict.py:44:125: B950 line too long (124 > 80 characters)

[fix] prepare_dict support english and chinese in one lexicon.txt

92f3f89

robin1001 requested a review from xingchensong February 13, 2023 12:08

xingchensong reviewed Feb 13, 2023

View reviewed changes

add comment

6c0b280

cuidongcai1035 added 3 commits February 15, 2023 10:59

formatting adjustment

c8e4894

formatting adjustment

7af6cab

warnings.warn stacklevel keyword set stacklevel of 2

37caf8f

robin1001 approved these changes Feb 15, 2023

View reviewed changes

robin1001 merged commit a983da9 into wenet-e2e:main Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] prepare_dict support english and chinese in one lexicon.txt #1693

[fix] prepare_dict support english and chinese in one lexicon.txt #1693

keanucui commented Feb 13, 2023

xingchensong Feb 13, 2023

keanucui Feb 14, 2023

keanucui Feb 14, 2023

xingchensong Feb 14, 2023

robin1001 commented Feb 15, 2023

robin1001 commented Feb 15, 2023

[fix] prepare_dict support english and chinese in one lexicon.txt #1693

[fix] prepare_dict support english and chinese in one lexicon.txt #1693

Conversation

keanucui commented Feb 13, 2023

xingchensong Feb 13, 2023

Choose a reason for hiding this comment

keanucui Feb 14, 2023

Choose a reason for hiding this comment

keanucui Feb 14, 2023

Choose a reason for hiding this comment

xingchensong Feb 14, 2023

Choose a reason for hiding this comment

robin1001 commented Feb 15, 2023

robin1001 commented Feb 15, 2023