结巴分词 PHP 实现 - The Jieba Chinese Word Segmentation Implemented in PHP
使用 PHP 7.4 中新增的 FFI 对 jieba-rs 进行了包装。
PHP >= 7.4,并开启 FFI 扩展
You can install the package via composer:
composer require binaryoung/jieba-php
use Binaryoung\Jieba\Jieba;
var_dump(Jieba::cut('PHP是世界上最好的语言!'));
array cut(string $sentence, bool $hmm = true)
array cutAll(string $sentence)
array cutForSearch(string $sentence, bool $hmm = true)
array TFIDFExtract(string $sentence, int $topK = 20, array $allowedPOS = [])
array textRankExtract(string $sentence, int $topK = 20, array $allowedPOS = [])
array tokenize(string $sentence, string $mode = 'default', bool $hmm = true)
array tag(string $sentence, bool $hmm = true)
int suggestFrequency(string $segment)
self addWord(string $word, ?int $frequency = null, ?string $tag = null)
self useDictionary(string $path)
see examples/example.php
composer example
composer test
composer bench
对比 jukuball/jieba-php,循环 50 次对围城每行文字作为一个句子进行分词,分词算法都采用 HMM 模式。
名称 | 耗时 | 单次耗时 | 内存占用 | 内存峰值 |
---|---|---|---|---|
jukuball/jieba-php | 51.593 | 1.032 | 493.00MB | 515.03MB |
binaryoung/jieba-php | 8.408 | 0.16816 | 10.00MB | 22.01MB |
差值 | ↓513.59% | ↓513.59% | ↓4830.00% | ↓2240.20% |
Please see CHANGELOG for more information on what has changed recently.
Please see CONTRIBUTING for details.
If you discover any security related issues, please email me instead of using the issue tracker.
The MIT License (MIT). Please see License File for more information.