Skip to content

A fast and reliable PHP library for detecting languages

License

Notifications You must be signed in to change notification settings

landrok/language-detector

Repository files navigation

LanguageDetector

Build Status Test Coverage Code Climate

LanguageDetector is a PHP library that detects the language from a text string.

Table of contents

Features

  • More than 50 supported languages, including Klingon
  • Very fast, no database needed
  • Packaged with a 2MB dataset
  • Learning steps are already done, library is ready to use
  • Small code, small footprint
  • N-grams algorithm
  • Supports PHP 5.4, 5.5, 5.6, 7.0, 7.1, 7.2, 7.3, 7.4, 8.0 and HHVM The latest release 1.3.x only supports PHP>=7.2

Install

composer require landrok/language-detector

Quick usage

Detect language

Instanciate a detector, pass a text and get the detected language.

require_once 'vendor/autoload.php';

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector\LanguageDetector();

$language = $detector->evaluate($text)->getLanguage();

echo $language; // Prints something like 'en'

Once it's instanciated, you can test multiple texts.

require_once 'vendor/autoload.php';

// An array of texts to evaluate
$texts = [
    'My tailor is rich and Alison is in the kitchen with Bob.',
    'Mon tailleur est riche et Alison est dans la cuisine avec Bob'
];

$detector = new LanguageDetector\LanguageDetector();

foreach ($texts as $key => $text) {

    $language = $detector->evaluate($text)->getLanguage();

    echo sprintf(
        "Text %d, language=%s\n",
        $key,
        $language
    );
}

Would output something like:

Text 0, language=en
Text 1, language=fr

Additionally, you can use a LanguageDetector instance as a string.

require_once 'vendor/autoload.php';

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector\LanguageDetector();

echo $detector->evaluate($text); // Prints something like 'en'
echo $detector; // Prints something like 'en' after an evaluate()

API Methods

evaluate()

Type \LanguageDetector\LanguageDetector

It performs an evaluation on a given text.

Example

After an evaluate(), the result is stored and available for later use.

$detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.');

// Then you have access to the detected language
$detector->getLanguage(); // Returns 'en'

You can make a one line call.

$detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.')
         ->getLanguage(); // Returns 'en'

It's possible to directly print evaluate() output.

// Returns 'en'
echo $detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.');

getLanguage()

Type string

The detected language

Example

$detector->getLanguage(); // Returns 'en'

getLanguages()

Type array

A list of loaded models that will be evaluated.

Example

$detector->getLanguages(); // Returns something like ['de', 'en', 'fr']

getScores()

Type array

A list of scores by language, for all evaluated languages.

Example

$detector->getScores();

// Returns something like
Array
(
    [en] => 0.43950135722745
    [nl] => 0.40898789832569
    [...]
    [ja] => 0
    [fa] => 0
)

getSupportedLanguages()

Type array

A list of supported languages that will be evaluated.

Example

$detector->getSupportedLanguages();

// Returns something like
Array
(
    [0] => af
    [1] => ar
    [...]
    [51] => zh-cn
    [52] => zh-tw

)

getText()

Type string

Returns the last string which has been evaluated

Example

$detector->getText();

// Returns 'My tailor is rich and Alison is in the kitchen with Bob.'

Options

Type \LanguageDetector\LanguageDetector

For even better performance, loaded models can be specified explicitly.

Example

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector(null, ['en', 'fr', 'de']);

$language = $detector->evaluate($text);

echo $language; // Prints something like 'en'

For one-liners only

Type \LanguageDetector\LanguageDetector

With a static call on detect() method, you can perform an evaluation on a given text, in one line.

Example

echo LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.'
); // Returns 'en'

You can use all API methods.

$detector = LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.'
);

// en
echo $detector;

// en
echo $detector->getLanguage();

// An array of all scores, see API method
print_r($detector->getScores());

// An array of all supported languages, see API method
print_r($detector->getSupportedLanguages());

// The last evaluated string
echo $detector->getText();

// Limit loaded languages for even better performance
echo LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.',
    ['en', 'de', 'fr', 'es']
); // en