-
安装与使用介绍,整理包括以下工具
- LAC
- Jieba
- THULAC
- pkuseg
- pyhanlp
- StandfordNLP
- LTP
- SnowNLP
- PyNLPIR
- foolnltk
-
调用分词
依照上述介绍安装好工具后,可通过我们封装好的接口,以统一的形式调用任一个工具的分词,代码如下所示:
from tools_wapper import * # 若使用StandfordNLP或者pyltp需要设置字典路径 standforddict = "./stanford-corenlp-4.0.0/" pyltpdict = "./ltp_data_v3.4.0/" # 选择想要调用的分词工具,需要安装,若未安装工具从列表中删除即可 tools = ['lac', 'jieba', 'pkuseg', 'thulac', 'pynlpir', 'pyhanlp', 'foolnltk', 'snownlp', ('standfordnlp', standforddict), ('pyltp', pyltpdict)] # 遍历调用所有工具 for tool in tools: if isinstance(tool, tuple): cutter = eval(tool[0] + "_impl")(tool[1]) tool = tool[0] else: cutter = eval(tool + "_impl")() print(tool, cutter.cut("人生自古谁无死,留取丹心照汗青"))
-
调用词性标注
依照上述介绍安装好工具后,可通过我们封装好的接口,以统一的形式调用任一个工具的词性标注,代码如下所示:
from tools_wapper import * # 若使用StandfordNLP或者pyltp需要设置字典路径 standforddict = "./stanford-corenlp-4.0.0/" pyltpdict = "./ltp_data_v3.4.0/" # 选择想要调用的分词工具,需要安装,若未安装工具从列表中删除即可 tools = ['lac', 'jieba', 'pkuseg', 'thulac', 'pynlpir', 'pyhanlp', 'foolnltk', 'snownlp', ('standfordnlp', standforddict), ('pyltp', pyltpdict)] # 遍历调用所有工具 for tool in tools: if isinstance(tool, tuple): cutter = eval(tool[0] + "_impl")(tool[1], 'postag') tool = tool[0] else: cutter = eval(tool + "_impl")('postag') print(tool, cutter.cut("人生自古谁无死,留取丹心照汗青"))
-
Notifications
You must be signed in to change notification settings - Fork 1
中文分词、词性标注、实体识别的工具整理;相关数据集整理与预处理;通用评测脚本脚本【Benchmark of Chinese Word Segment (CWS)、Part of Speech Tagging(PosTag)、Name Entity Recognition (NER) 】
License
Bond-H/lac_tools_benchment
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
中文分词、词性标注、实体识别的工具整理;相关数据集整理与预处理;通用评测脚本脚本【Benchmark of Chinese Word Segment (CWS)、Part of Speech Tagging(PosTag)、Name Entity Recognition (NER) 】
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published