Skip to content

isTraditional类似函数判断繁体时存在错误 #46

Open
@SoftBlackSheep

Description

字典文件加载时,按字符切分之后都加载进了 tSet,但是字典里面不是每个繁体词语,构成的字符都是繁体字,导致加载进了很多简体字符,使用时候会出现误判

public Set<String> tChars() {
        //DLC-保证只初始化一次
        if(CollectionUtil.isNotEmpty(tSet)) {
            return tSet;
        }

        if(CollectionUtil.isEmpty(tSet)) {
            synchronized (tSet) {
                // DLC
                if(CollectionUtil.isEmpty(tSet)) {
                    // 繁体=》简体 词组
                    Map<String, List<String>> tsPhrase = this.tsPhrase();
                    this.addCharToSet(tSet, tsPhrase.keySet());

                    //繁体=》简体 单个字
                    Map<String, List<String>> tsChar = this.tsChar();
                    this.addCharToSet(tSet, tsChar.keySet());

                    //简体=》繁体 词组
                    Map<String, List<String>> stPhrase = this.stPhrase();
                    for(Map.Entry<String, List<String>> entry : stPhrase.entrySet()) {
                        this.addCharToSet(tSet, entry.getValue());
                    }

                    //简体=》繁体 单个字
                    Map<String, List<String>> stChar = this.stChar();
                    for(Map.Entry<String, List<String>> entry : stChar.entrySet()) {
                        this.addCharToSet(tSet, entry.getValue());
                    }

                    // 文本字典
                    List<String> tcLines = StreamUtil.readAllLines("/data/dictionary/tc.txt");
                    for(String line : tcLines) {
                        tSet.addAll(StringUtil.toCharStringList(line));
                    }
                }
            }
        }

        return tSet;
    }

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions