开发者

PHP strlen and mb_strlen not working as expected

PHP functions strlen() and mb_strlen() both are returning the wrong number of characters when I run them on a st开发者_StackOverflowring.

Here is a piece of the code I'm using...

 $foo = mb_strlen($itemDetails['ITEMDESC'], 'UTF-8');
 echo $foo;

It is telling me this sting - "4½" Straight Iris Scissors" is 45 characters long. It's 27.

It also tells me that this string - "Infant Heel Warmer, No Adhesive Attachment Pad, 100/cs" is 54, which is correct.

I assume its some issue with character encoding, everything should be UTF-8 I think. I've tried feeding mb_strlen() several different character encoding types and they all are returning this oddball count with the string that has those non-standard characters.

I've no idea why this is happening.


Double-check whether your text really is UTF-8 or not. That "Â" character makes it look like a classic character encoding problem to me. You should check the entire path from the origin of the text through the point in your code that you quoted above, because there are a lot of places where the encodings can get munged.

Did the text originate from an HTML form? Ensure your <form> element includes the accept-charset="UTF-8" attribute.

Did the text get stored in a database along the way? Make sure the database stores and returns the data in UTF-8. This means checking the server's global defaults, the defaults for the database or schema, and the table itself.


It is very likely that your input is encoded in UTF-16. You may convert to UTF-8

$foo = mb_strlen(mb_convert_encoding($itemDetails['ITEMDESC'], "UTF-8", "UTF-16"));

or if you use mb_strlen() be sure to use proper encoding as a second parameter.

$foo = mb_strlen($itemDetails['ITEMDESC'], "UTF-16");

Without correct encoding mb_strlen will always return wrong results. It's easy to get into troubles when you're dealing with UTF-8/16/32 encoded strings. mb_detect_encoding() will not solve this problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜