开发者

Removing accents chars from arabic text

I've a database with collation of utf8 / utf8_bin. DB have arabic text with accent chars (kasar etc). I want to search text without accent chars.

What i did is, i wrote 'like' query ...

WHERE replace(field1,0x[CODE],'') like '%[arabic text]%' --- where [CODE] is th开发者_StackOverflow中文版e accent char.

This is working fine when i write static utf arabic text in place of field1. But it does not replace when it run from database field.

May be db field have latin1 encoding. How can i fix / check this issue.

Thanks.


utf8_bin is a binary collation - it is extremely literal and strict in comparing characters.

utf8_general_ci is more lenient and normalizes Umlauts and accents to their "basic" version:

Ä => A

Ü => U etc.

I have never worked with Arabic before so I don't know whether it applies for those accents as well but I would expect so.

You should be able to temporarily use the lenient collation like this:

WHERE field1 like '%[arabic text]%' COLLATE utf8_general_ci;


i used the below query in postgreSQL/navicat to create a new accentless column of the original accented text


with remove_accent_1 as (

SELECT

index, text as original_text, replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace("text", 'ٍ',''), 'ِ',''), 'ً',''), 'َ',''), 'ّ',''), 'ْ',''), 'ۖ',''), 'ٌ',''), 'ُ',''), 'آ','ا'), 'ٰ ',''), ' ۚ ',''), 'ۗ ','') as new_text

FROM "arabic_text"

ORDER BY "index" ASC )

select

index, original_text, new_text

from remove_accent_1

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜