开发者

regex to filter all but whitelisted characters from a multi-language string

I am trying to cleanup a string coming from a search box on a multi-language site.

Normally I would use a regex like:

$allowed = "-+?!,.;:\w\s";
$txt_search = preg_replace("/[^" . $allowed . "]?(.*?)[^" . $allowed . "]?/iu", "$1", $_GET['txt_search']);

and that works 开发者_JS百科fine for English texts.

However, now I need to do the same when the texts entered can be in any language (Russian now, Chinese in the future).

How can I clean up the string while preserving "normal texts" in the original language?

I though about switching to a blacklist (although I´d rather not...) but at this moment the regex just completely destroys all original input.


you can use \p{LN} [\p{L}\p{N}] instead of \w , see http://www.php.net/manual/en/regexp.reference.unicode.php


It is common problem, that russian letters not recognised by \w pattern, so you can use

$allowed = "-+?!,.;:\w\sа-я";

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜