2017 © Pedro Peláez
 

library php-sentence

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

image

vanderlee/php-sentence

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  • Monday, October 30, 2017
  • by vanderlee
  • Repository
  • 7 Watchers
  • 24 Stars
  • 5,273 Installations
  • PHP
  • 2 Dependents
  • 0 Suggesters
  • 5 Forks
  • 1 Open issues
  • 7 Versions
  • 20 % Grown

The README.md

Sentence

License Build Status Quality, (*1)

Version 1.0.8, (*2)

Copyright © 2016-2024 Martijn van der Lee (@vanderlee), parts copyright © 2017 @marktaw., (*3)

MIT Open Source license applies., (*4)

Introduction

PHP natural language sentence segmentation (splitting) and counting. Sentence boundary disambiguation., (*5)

Still early, but should support most western languages. If you find any problems, please let me know., (*6)

Supports PHP 5.3 and up, so you can use it on older servers., (*7)

Installation

Requires PHP 5.4 or greater. PHP 5.3 is supported as long as no more recent features are absolutely necessary., (*8)

To install using Composer:, (*9)

composer require vanderlee/php-sentence

Methods

integer count(string $text)

Counts the number of sentences in the text. Provided for convenience; this is exactly the same as counting the number of returned array items from split, so if you need both results, just do that., (*10)

array split(string $text,integer $flags = 0)

Splits the text into sentences., (*11)

$flags is zero (0, default) or the following class constant:, (*12)

  • Sentence::SPLIT_TRIM: Trim whitespace off the left and right sides of each returned sentence.

Documentation

You can find documentation generated from the source code by ApiGen here: ApiGen documentation, (*13)

Examples

split($text);

        // Count the number of sentences
        $count      = $Sentence->count($text);

    ?>

How it works

The method used is not based on any on the established or published methods. It seems to work pretty well, though., (*14)

The method follows a number of simple steps in splitting and re-merging the text into full sentences. You can easily check the steps in the code., (*15)

Though the splitting may be a bit off, in particular abbreviations at the start of sentences tend to be merged with the preceding sentences. In most ordinary text this should pose no problem. In either case this should not affect the sentence count except in very uncommon situations., (*16)

It should be noted that this algorithm depends on reasonably gramatically correct punctuation. Do not L33t-5p3ak!!!!!1!1!11!eleven!!, (*17)

Rules

The following is a rough list of the rules used to split sentences., (*18)

  • Each linebreak separates sentences.
  • The end of the text indicates the end if a sentence if not otherwise ended through proper punctuation.
  • Sentences must be at least two words long, unless a linebreak or end-of-text.
  • An empty line is not a sentence.
  • Each question- or exclamation mark or combination thereof, is considered the end of a sentence.
  • A single period is considered the end of a sentence, unless...
    • It is preceded by one word, or...
    • It is followed by one word.
  • A sequence of multiple periods is not considered the end of a sentence.

The Versions

30/10 2017

dev-master

9999999-dev https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

count sentence split segmentation boundary disambiguation

30/10 2017

dev-gh-pages

dev-gh-pages https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

count sentence split segmentation boundary disambiguation

14/10 2017

dev-scrutinizer-patch-1

dev-scrutinizer-patch-1 https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

count sentence split segmentation boundary disambiguation

14/10 2017

1.0.4

1.0.4.0 https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

count sentence split segmentation boundary disambiguation

21/05 2017

1.0.3

1.0.3.0 https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

count sentence split segmentation boundary disambiguation

20/03 2016

1.0.2

1.0.2.0 https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

count sentence split segmentation boundary disambiguation

11/02 2016

1.0.1

1.0.1.0 https://github.com/vanderlee/php-sentence.git

Simple text sentence splitting and counting. Supports atleast english, german and dutch, possibly more.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

count sentence split segmentation boundary disambiguation