开发者

Cut an UTF8 text in PHP

I get UTF8 text from a database, and I want to show only the first $len characters (finishing in a word). I've tried several options but the function still doesn't work because of special characters (á, é, í, ó, 开发者_运维百科etc).

Thanks for the help!

function text_limit($text, $len, $end='...')
{ 

  mb_internal_encoding('UTF-8');
  if( (mb_strlen($text, 'UTF-8') > $len) ) { 

    $text = mb_substr($text, 0, $len, 'UTF-8');
    $text = mb_substr($text, 0, mb_strrpos($text," ", 'UTF-8'), 'UTF-8');

    ...
  }
}

Edit to add an example

If I truncate a text with 65 characters, it returns:

Un jardín de estilo neoclásico acorde con el …

If I change the special characters (í, á), then it returns:

Un jardin de estilo neoclasico acorde con el Palacio de …

I'm sure there is something strange with the encoding or the server, or php; but I can't figure it out! Thanks!

Final Solution

I'm using this UTF8 PHP library and everything works now...


use mb_substr. first arg the string to check second is the starting position the third is lenght and last is the encoding.

mb_substr ("String", 0, $len, 'utf-8');


mb_strrpos($text," ", 'UTF-8')

You are not passing enough args to mb_strrpos() (you have omitted the offset - 3rd param, the encoding is the 4th param), try:

mb_strrpos($text," ", 0, 'UTF-8')

Although with the 2nd line omitted it, it looks OK, like you say... "I want to show only the first $len characters (finishing in a word)" - the 2nd line makes sure it finishes on a whole word?

EDIT: mb_substr() should be cutting at $len number of characters, not bytes. Are you sure the original text is actually UTF-8 and not some other encoding?


Ok, so this has been baffling me that you can't get this to work because it should work just fine. Finally I think I have come up with the reason that this is not working for you.

What I think is going on here is that your browser is displaying in the wrong encoding and you are outputting utf-8 characters.

you have a couple options. First if you are displaying any of this as part of an html page check your meta tags to see if they are setting the character encoding.. If so change it to this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

next if you are just outputting this directly to the browser use the header function to set the character encoding like so:

header("Content-type: text/html; charset=utf-8");

an easy test:

<?php
    header("Content-type: text/html; charset=utf-8");
    $text = "áéíó";
    echo mb_substr($text, 0, 3, 'utf-8');
?>

without this your browser will default to another encoding and display the text impropperly. Hopefully this helps you fix this issue, if not I'll keep trying :)


How about trying mb_strcut(). Same params as mb_substr().


This could be because your original solution truncated the string to 65 bytes, which normally would equate to 65 characters in an ASCII-only context, but becomes incorrect when UTF-8's multi-byte ranges are used. When truncating a string to 65 bytes - the string itself may be of variable length depending on the number of bytes in each character. That would also probably be dangerous as you could cut a character in half (splitting the multiple bytes).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜