Allow only (English & Arabic) in UTF-8 code
I am looking for a regex to change all non-english and开发者_C百科/or arabic into underscore "_"
Currently I have the following code which works but I think that I've got the wrong unicode
range as it allows Chinese & other languages I don't require in my script.
$title=~tr/[a-z0-9_\x7f-\xff]/_/cd;
Any help would be appreciated
If you're seeing bytes between \x7f and \xff, your application is probably working with UTF-8 bytes, not Unicode characters. Read perldoc perlunicode, then decode() your strings before trying to work with them on this level.
Once that's done, you should be able to search for English and Arabic characters with something like:
/[\p{ASCII}\p{Arabic}]/
See perldoc perluniprops for other Unicode properties you can use.
The range of the Arabic (Indic) digits is: \x{0660}-\x{0669}
The range of the Arabic letters is: \x{0621}-\x{063A}\x{0641}-\x{064A}
The range of the Arabic vowels including "Tatweel" is: \x{0640}\x{064B}-\x{0652}
The range of the Arabic puncation is: \x{060C}\x{060D}\x{061B}-\x{061F}\x{2E2E}\x{066A}-\x{066D}
加载中,请稍侯......
精彩评论