2017 © Pedro Peláez
 

library text-language-detect

Detects the language of a given piece of text.

image

webmil/text-language-detect

Detects the language of a given piece of text.

  • Thursday, September 25, 2014
  • by Webmil
  • Repository
  • 5 Watchers
  • 42 Stars
  • 39,885 Installations
  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 24 Forks
  • 5 Open issues
  • 1 Versions
  • 2 % Grown

The README.md

Text Language Detect

Detects the language of a given piece of text., (*1)

The package attempts to detect the language of a sample of text by correlating ranked 3-gram frequencies to a table of 3-gram frequencies of known languages., (*2)

It implements a version of a technique originally proposed by Cavnar & Trenkle (1994): "N-Gram-Based Text Categorization"., (*3)

This is a fork of Text_LanguageDetect 0.3.0 (alpha)., (*4)

Dependencies:

PHP Version: PHP 5.3 or newer
PHP Extension: pcre
PHP Extension: mbstring (optional)

Usage example

<?php

use TextLanguageDetect\TextLanguageDetect;
use TextLanguageDetect\LanguageDetect\TextLanguageDetectException;

$l = new TextLanguageDetect();

echo "Supported languages:\n";
try {
    $langs = $l->getLanguages();
    sort($langs);
    echo implode(', ', $langs) . "\n\n";
} catch (TextLanguageDetectException $e) {
    die($e->getMessage());
}

$text = <<<EOD
Hallo! Das ist ein Text in deutscher Sprache.
Mal sehen, ob die Klasse erkennt, welche Sprache das hier ist.
EOD;

try {
    //return 2-letter language codes only
    $l->setNameMode(2);

    $result = $l->detect($text, 4);
    print_r($result);
} catch (TextLanguageDetectException $e) {
    die($e->getMessage());
}

Output:, (*5)

// output
Supported languages:
albanian, arabic, azeri, bengali, bulgarian, cebuano, croatian, czech,
danish, dutch, english, estonian, farsi, finnish, french, german, hausa,
hawaiian, hindi, hungarian, icelandic, indonesian, italian, kazakh, kyrgyz,
latin, latvian, lithuanian, macedonian, mongolian, nepali, norwegian, pashto,
pidgin, polish, portuguese, romanian, russian, serbian, slovak, slovene, somali,
spanish, swahili, swedish, tagalog, turkish, ukrainian, urdu, uzbek, vietnamese,
welsh

Array
(
    [de] => 0.40703703703704
    [nl] => 0.2880658436214
    [en] => 0.28333333333333
    [da] => 0.23452674897119
)

Author

Nicholas Pisarro - infinityminusnine+pear@gmail.com, (*6)

License

http://www.debian.org/misc/bsd.license BSD, (*7)

The Versions

25/09 2014

dev-master

9999999-dev http://pear.php.net/package/Text_LanguageDetect

Detects the language of a given piece of text.

  Sources   Download

BSD

The Requires

  • php >=5.3.0

 

by Nicholas Pisarro
by Popadjuk Oleh

language detect