2017 © Pedro Peláez
 

library utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

image

patchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  • Wednesday, May 18, 2016
  • by nicolas-grekas
  • Repository
  • 23 Watchers
  • 451 Stars
  • 9,668,610 Installations
  • PHP
  • 75 Dependents
  • 3 Suggesters
  • 35 Forks
  • 10 Open issues
  • 50 Versions
  • 3 % Grown

The README.md

Patchwork UTF-8 for PHP

Latest Stable Version Total Downloads Build Status SensioLabsInsight, (*1)

Patchwork UTF-8 gives PHP developpers extensive, portable and performant handling of UTF-8 and grapheme clusters., (*2)

It provides both :, (*3)

  • a portability layer for mbstring, iconv, and intl Normalizer and grapheme_* functions,
  • an UTF-8 grapheme clusters aware replica of native string functions.

It can also serve as a documentation source referencing the practical problems that arise when handling UTF-8 in PHP: Unicode concepts, related algorithms, bugs in PHP core, workarounds, etc., (*4)

Version 1.2 adds best-fit mappings for UTF-8 to Code Page approximations. It also adds Unicode filesystem access under Windows, using preferably wfio or a COM based fallback otherwise., (*5)

Portability

Unicode handling in PHP is best performed using a combo of mbstring, iconv, intl and pcre with the u flag enabled. But when an application is expected to run on many servers, you should be aware that these 4 extensions are not always enabled., (*6)

Patchwork UTF-8 provides pure PHP implementations for 3 of those 4 extensions. pcre compiled with unicode support is required but is widely available. The following set of portability-fallbacks allows an application to run on a server even if one or more of those extensions are not enabled:, (*7)

  • utf8_encode, utf8_decode,
  • mbstring: mb_check_encoding, mb_convert_case, mb_convert_encoding, mb_decode_mimeheader, mb_detect_encoding, mb_detect_order, mb_encode_mimeheader, mb_encoding_aliases, mb_get_info, mb_http_input, mb_http_output, mb_internal_encoding, mb_language, mb_list_encodings, mb_output_handler, mb_strlen, mb_strpos, mb_strrpos, mb_strtolower, mb_strtoupper, mb_stripos, mb_stristr, mb_strrchr, mb_strrichr, mb_strripos, mb_strstr, mb_strwidth, mb_substitute_character, mb_substr, mb_substr_count,
  • iconv: iconv, iconv_mime_decode, iconv_mime_decode_headers, iconv_get_encoding, iconv_set_encoding, iconv_mime_encode, ob_iconv_handler, iconv_strlen, iconv_strpos, iconv_strrpos, iconv_substr,
  • intl: Normalizer, grapheme_extract, grapheme_stripos, grapheme_stristr, grapheme_strlen, grapheme_strpos, grapheme_strripos, grapheme_strrpos, grapheme_strstr, grapheme_substr, normalizer_is_normalized, normalizer_normalize.

Patchwork\Utf8

Grapheme clusters should always be considered when working with generic Unicode strings. The Patchwork\Utf8 class implements the quasi-complete set of native string functions that need UTF-8 grapheme clusters awareness. Function names, arguments and behavior carefully replicates native PHP string functions., (*8)

Some more functions are also provided to help handling UTF-8 strings:, (*9)

  • filter(): normalizes to UTF-8 NFC, converting from CP-1252 when needed,
  • isUtf8(): checks if a string contains well formed UTF-8 data,
  • toAscii(): generic UTF-8 to ASCII transliteration,
  • strtocasefold(): unicode transformation for caseless matching,
  • strtonatfold(): generic case sensitive transformation for collation matching,
  • strwidth(): computes the width of a string when printed on a terminal,
  • wrapPath(): unicode filesystem access under Windows and other OSes.

Mirrored string functions are: strlen, substr, strpos, stripos, strrpos, strripos, strstr, stristr, strrchr, strrichr, strtolower, strtoupper, wordwrap, chr, count_chars, ltrim, ord, rtrim, trim, str_ireplace, str_pad, str_shuffle, str_split, str_word_count, strcmp, strnatcmp, strcasecmp, strnatcasecmp, strncasecmp, strncmp, strcspn, strpbrk, strrev, strspn, strtr, substr_compare, substr_count, substr_replace, ucfirst, lcfirst, ucwords, number_format, utf8_encode, utf8_decode, json_decode, filter_input, filter_input_array., (*10)

Notably missing (but hard to replicate) are printf-family functions., (*11)

The implementation favors performance over full edge cases handling. It generally works on UTF-8 normalized strings and provides filters to get them., (*12)

As the turkish locale requires special cares, a Patchwork\TurkishUtf8 class is provided for working with this locale. It clones all the features of Patchwork\Utf8 but knows about the turkish specifics., (*13)

Usage

The recommended way to install Patchwork UTF-8 is through composer. Just create a composer.json file and run the php composer.phar install command to install it:, (*14)

{
    "require": {
        "patchwork/utf8": "~1.2"
    }
}

Then, early in your bootstrap sequence, you have to configure your environment:, (*15)

\Patchwork\Utf8\Bootup::initAll(); // Enables the portablity layer and configures PHP for UTF-8
\Patchwork\Utf8\Bootup::filterRequestUri(); // Redirects to an UTF-8 encoded URL if it's not already the case
\Patchwork\Utf8\Bootup::filterRequestInputs(); // Normalizes HTTP inputs to UTF-8 NFC

Run phpunit to see the code in action., (*16)

Make sure that you are confident about using UTF-8 by reading Character Sets / Character Encoding Issues and Handling UTF-8 with PHP, or PHP et UTF-8 for french readers., (*17)

You should also get familiar with the concept of Unicode Normalization and Grapheme Clusters., (*18)

Do not blindly replace all use of PHP's string functions. Most of the time you will not need to, and you will be introducing a significant performance overhead to your application., (*19)

Screen your input on the outer perimeter so that only well formed UTF-8 pass through. When dealing with badly formed UTF-8, you should not try to fix it (see Unicode Security Considerations). Instead, consider it as CP-1252 and use Patchwork\Utf8::utf8_encode() to get an UTF-8 string. Don't forget also to choose one unicode normalization form and stick to it. NFC is now the defacto standard. Patchwork\Utf8::filter() implements this behavior: it converts from CP1252 and to NFC., (*20)

This library is orthogonal to mbstring.func_overload and will not work if the php.ini setting is enabled., (*21)

Licensing

Patchwork\Utf8 is free software; you can redistribute it and/or modify it under the terms of the (at your option): - Apache License v2.0, or - GNU General Public License v2.0., (*22)

Unicode handling requires tedious work to be implemented and maintained on the long run. As such, contributions such as unit tests, bug reports, comments or patches licensed under both licenses are really welcomed., (*23)

I hope many projects could adopt this code and together help solve the unicode subject for PHP., (*24)

The Versions

18/05 2016

dev-master

9999999-dev https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

18/05 2016

v1.3.1

1.3.1.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

15/12 2015

v1.3.0

1.3.0.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

15/12 2015

1.2.x-dev

1.2.9999999.9999999-dev https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

15/12 2015

v1.2.6

1.2.6.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

15/12 2015

1.1.x-dev

1.1.9999999.9999999-dev https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

15/12 2015

v1.1.31

1.1.31.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

14/10 2015

v1.2.5

1.2.5.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

29/06 2015

v1.1.30

1.1.30.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

25/06 2015

v1.2.3

1.2.3.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

26/04 2015

v1.2.2

1.2.2.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

26/04 2015

v1.1.29

1.1.29.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

29/01 2015

v1.2.1

1.2.1.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

12/01 2015

v1.2.0

1.2.0.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

12/01 2015

v1.1.28

1.1.28.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

11/01 2015

v1.1.27

1.1.27.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

08/11 2014

v1.1.26

1.1.26.0 https://github.com/tchwork/utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8 grapheme

05/08 2014

v1.2.0-beta

1.2.0.0-beta https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

05/08 2014

v1.1.25

1.1.25.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

17/06 2014

v1.1.24

1.1.24.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

22/05 2014

v1.1.23

1.1.23.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

06/05 2014

v1.1.22

1.1.22.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

26/03 2014

v1.1.21

1.1.21.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

01/03 2014

v1.1.20

1.1.20.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

02/02 2014

v1.1.18

1.1.18.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.3

 

utf-8 i18n unicode utf8

02/01 2014

v1.1.17

1.1.17.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre >=7.9

 

utf-8 i18n unicode utf8

07/12 2013

v1.1.16

1.1.16.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre *

 

utf-8 i18n unicode utf8

23/11 2013

v1.1.15

1.1.15.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre *

 

utf-8 i18n unicode utf8

04/11 2013

v1.1.14

1.1.14.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre *

 

utf-8 i18n unicode utf8

12/10 2013

v1.1.13

1.1.13.0 https://github.com/nicolas-grekas/Patchwork-UTF8

Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre *

 

utf-8 i18n unicode utf8

04/10 2013

v1.1.12

1.1.12.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0
  • lib-pcre *

 

utf-8 i18n unicode utf8

19/08 2013

v1.1.11

1.1.11.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

13/08 2013

v1.1.10

1.1.10.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

04/08 2013

v1.1.9

1.1.9.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

24/05 2013

v1.1.8

1.1.8.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

23/05 2013

v1.1.7

1.1.7.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

13/05 2013

v1.1.6

1.1.6.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

13/05 2013

v1.1.5

1.1.5.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

13/05 2013

v1.1.4

1.1.4.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

22/04 2013

v1.0.6

1.0.6.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/04 2013

v1.1.3

1.1.3.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/04 2013

v1.1.1

1.1.1.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/04 2013

v1.1.2

1.1.2.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/04 2013

v1.1.0

1.1.0.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/04 2013

v1.0.5

1.0.5.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

13/12 2012

v1.0.4

1.0.4.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

18/10 2012

v1.0.3

1.0.3.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 strings handling for PHP 5.3: portable, performant and extended

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

17/10 2012

v1.0.2

1.0.2.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 portability and grapheme clusters for PHP 5.3

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

16/10 2012

v1.0.1

1.0.1.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 portability and grapheme clusters for PHP 5.3

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8

15/10 2012

v1.0.0

1.0.0.0 https://github.com/nicolas-grekas/Patchwork-UTF8

UTF-8 portability and grapheme clusters for PHP 5.3

  Sources   Download

(Apache-2.0 or GPL-2.0)

The Requires

  • php >=5.3.0

 

utf-8 i18n unicode utf8