开发者

Removing Various symbols like  é

OK I have read many threads and have found some options that work but now I am just more curious than anything...

When trying to remove characters like: Â é as google does not like them in the XML product feed.

Why does this work:

But neither of these 2 do?

$string = preg_replace("/[^[:print:]]+/", ' ', $string);

$string = preg_replace("/[^[:print:]]/", ' ', $string);

To put it all in context here is the full function:

        // Remove all unprintable characters
        $string = ereg_replace("[^[:print:]]", ' ', $string);
        // Convert back into HTML entities after printable characters removed
        $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
        // Decode back
        $string = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
        // Return the UTF-8 encoded string
        $string = strip_tags(stripslashes($string));
      开发者_开发问答  // Return the UTF-8 encoded string
        return utf8_encode($string);
    }           


The reason that code doesn't work is because it removes characters that are not in the posix :print: character group which is comprised of printable characters. á É, etc are all printable.

You can find more about posix sets here.

Also, removing accentuated characters might not always be the best option... Check out this question for alternatives.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜