开发者

Count Number of Characters in a Mixed String of ASCII and Unicode

strlen($username);

Username can carry ASCII, Unicode or both.

Example:

Jam123 (ASCII) - 6 characters

ابت (Unicode) - 3 characters but strlen returns 6 bytes as unicode is 2 bytes per char.

Jamت (Unicode and ASCII) - 5 characters (3 ASCII and 2 Unicode even though i have only one unicode character)

Username in all cases shouldn't go beyond 25 characters and shouldn't be 开发者_JAVA百科less than 4 chars.

My main problem is when mixing Unicode and ASCII together, how can i keep track of count so the condition statement can deicde whether username is not over 25 and not less than 4.

if(strlen($username) <= 25 && !(strlen($username) < 4))

3 characters in unicode will be counted as 6 bytes which causes trouble because it allows user to have a username of 3 unicode characters when the characters should be minimum of 4.

Numbers will always be in ASCII


Use mb_strlen(). It takes care of unicode characters.

Example:

mb_strlen("Jamت", "UTF-8"); // 4


You can use mb_strlen where you select your encoding.

http://sandbox.phpcode.eu/g/3a144/1

<?php 
echo mb_strlen('ابت', 'UTF8'); // returns 3


function to count words in UNICODE sentence/string:

function mb_count_words($string) 
{
    preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches);  return count($matches[0]);
}

or

function mb_count_words($string, $format = 0, $charlist = '[]') {
    $string=trim($string);
    if(empty($string))
        $words = array();
    else
        $words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
    switch ($format) {
        case 0:
            return count($words);
            break;
        case 1:
        case 2:
            return $words;
            break;
        default:
            return $words;
            break;
    }
}


then do:

echo mb_count_words("chào buổi sáng");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜