Remove garbage characters in arabic
I needed to remove all non Arabic characters from a string and eventually wi开发者_运维技巧th the help of people from stack-overflow was able to come up with the following regex to get rid of all characters which are not Arabic.
preg_replace('/[^\x{0600}-\x{06FF}]/u','',$string);
The problem is the above removes white spaces too. And now I discovered I would need character from A-Z,a-z,0-9, !@#$%^&*()
also. So how do I need to modify the regex?
Thanking you
Add the ones you want to keep to your character class:
preg_replace('/[^\x{0600}-\x{06FF}A-Za-z !@#$%^&*()]/u','', $string);
assume you have this string:
$str = "Arabic Text نص عربي test 123 و,.m,............ ~~~ ٍ،]ٍْ}~ِ]ٍ}";
this will keep arabic chars with spaces only.
echo preg_replace('/[^أ-ي ]/ui', '', $str);
this will keep Arabic and English chars with Numbers Only
echo preg_replace('/[^أ-يA-Za-z0-9 ]/ui', '', $str);
this will answer your question latterly.
echo preg_replace('/[^أ-يA-Za-z !@#$%^&*()]/ui', '', $str);
In a more detailed manner from Above example, Considering below is your string:
$string = '<div>This..</div> <a>is<a/> <strong>hello</strong> <i>world</i> ! هذا هو مرحبا العالم! !@#$%^&&**(*)<>?:";p[]"/.,\|`~1@#$%^&^&*(()908978867564564534423412313`1`` "Arabic Text نص عربي test 123 و,.m,............ ~~~ ٍ،]ٍْ}~ِ]ٍ}"; ';
Code:
echo preg_replace('/[^\x{0600}-\x{06FF}A-Za-z0-9 !@#$%^&*().]/u','', strip_tags($string));
Allows:
English letters, Arabic letters, 0 to 9 and characters !@#$%^&*().
Removes:
All html tags, and special characters other than above
精彩评论