How to compute a unicode string which bidirectional representation is specified?

2022-12-28 19:39 问答作者：

fellows. I have a rather pervert question. Please forgive me :)

There's an official algorithm that describes how bidirectional unicode text should be presented. http://www.unicode.org/reports/tr9/tr9-15.html

I receive a string (from some 3rd-party source), which contains latin/hebrew characters, as well as digits, white-spaces, punctuation symbols and etc.

The problem is that the string that I receive is already in the representation form. I.e. - the sequence of characters that I receive should just be presented from left to right.

Now, my goal is to find the unicode string which representation is exactly the same. Means - I need to pass that string开发者_如何学Go to another entity; it would then render this string according to the official algorithm, and the result should be the same.

Assuming the following:

The default text direction (of the rendering entity) is RTL.
I don't want to inject "special unicode characters" that explicitly override the text direction (such as RLO, RLE, etc.)
I suspect there may exist several solutions. If so - I'd like to preserve the RTL-looking of the string as much as possible. The string usually consists of hebrew words mostly. I'd like to preserve the correct order of those words, and characters inside those words. Whereas other character sequences may (and should) be transposed.

One naive way to solve this is just to swap the whole string (this takes care of the hebrew words), and then swap inside it sequences of non-hebrew characters. This however doesn't always produce correct results, because actual rules of representation are rather complex.

The only comprehensive algorithm that I see so far is brute-force check. The string can be divided into sequences of same-class characters. Those sequences may be joined in random order, plus any of them may be reversed. I can check all those combinations to obtain the correct result. Plus this technique may be optimized. For instance the order of hebrew words is known, so we only have to check different combinations of their "joining" sequences.

Any better ideas? If you have an idea, not necessarily the whole solution - it's ok. I'll appreciate any idea. Thanks in advance.

If you want to check if a character is Bidirectional you have to use UCD (Unicode Character Database) which provided by Unicode.org and includes lots of information about characters . in one of that DB attributes you can find the Bidirectionality of a character

So you have to Download USD , then write a class to look for your character in the XML and return answer

I did this in an opensource C# application and you can ind it here http://Unicode.Codeplex.com

Please let me know has your issue resolved by this or not.

Nasser, thanks for the answer. Unfortunately it doesn't fully resolve my problem.

So far for every character I can know its directionality. Still I don't see how can I compute the whole string so that its representation would match what I need.

Imagine you want to have the following text written from left to right, whereas hebrew/arabic characters are denoted by BIG:

ABC eng 123 456 DEF

The correct string would be like this: FED 456 123 eng CBA or also: FED eng 456 123 CBA

Or, if using explicit direction override codes it can be written like this: FED eng 123 456 CBA

Currently I solved this problem by injecting explicit directionality override codes into the string. So that I isolate sequences of hebrew/arabic words, and for all the joining LTR/Weak/Neutral characters I explicitly override the direction to LTR.

However I'd like to do this without injecting explicit override codes.

继续阅读：unicode

How to compute a unicode string which bidirectional representation is specified?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？