Remove non-alphanumeric characters (including ß, Ê, etc.) from a string
Is there an easy way to remove all non alphanumeric characters from a string in PHP that wouldn't require listing them all individually in a regex function?
I have been using preg_replace("/[^a-zA-Z0-9\s\'\-]/", "", $my_string);
in the past but this filters out important characters like ÀÈÌÒÙß
etc.
I need to sanitize a name field, so monetary and mathematical characters/symbols are not needed.
Like this:
preg_replace('/[^\p{L}\p{N}\s]/u', '', $my_string);
As arnaud576875 already mentioned, you should be aware that the pattern is treated as UTF-8 when using the u
modifier like I did. Relevant excerpt of the appropriate manual page:
u (PCRE8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
Use unicode category :
preg_replace("/[^\pL\pN\p{Zs}'-]/u", "", $my_string);
精彩评论