开发者

Getting first symbol from a glyph

Related (in fact, perhaps a duplicate of): how to extract characters from a Korean string in VBA

The linked question doesn't give me satisfactory answers and it's 2 years old so I'm making a new question.

I want to find the first symbol in a Korean glyph, ie. "한" -> "ㅎ" or "가" -> "ㄱ". I also want to recognize inputs that are already single symbols, such as "ㄱ".

I'm working with NSString, which I believe uses UTF-8. Do I have to convert th开发者_Go百科e string to EUC-KR, then start reading bytes, or what?


As a disclaimer, I have no experience in working with iphone or NSString, except for what I've read in the documentation in order to answer this question. I'm addressing the question mainly as a unicode problem.

In order to find the first symbol (jamo) from a Korean glyph, you have to perform a decomposition as described in my answer to how to extract characters from a Korean string in VBA (it's a new answer so you didn't see it when you posted your question). To apply my answer (which is derived directly from the Unicode standard), you have to work with the Unicode code points (numerical values) of the Korean syllables. It looks like calling the method dataUsingEncoding passing NSUnicodeStringEncoding as a parameter should do the trick.

In order to identify single symbols, you have to check whether the Unicode code point of the character you are checking is in any of the following ranges:

  • 1100-11FF (Hangul Jamo). I think this should cover most of the real life cases.
  • A960-A97F (Hangul Jamo Extended-A)
  • D7B0-D7FF (Hangul Jamo Extended-B)
  • 3130-318F (Hangul Compatibility Jamo)
  • FFA0-FFDC (Halfwidth Jamo)

Check the Unicode Code Charts for a complete reference.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜