Getting first symbol from a glyph
Related (in fact, perhaps a duplicate of): how to extract characters from a Korean string in VBA
The linked question doesn't give me satisfactory answers and it's 2 years old so I'm making a new question.
I want to find the first symbol in a Korean glyph, ie. "한" -> "ㅎ" or "가" -> "ㄱ". I also want to recognize inputs that are already single symbols, such as "ㄱ".
I'm working with NSString, which I believe uses UTF-8. Do I have to convert th开发者_Go百科e string to EUC-KR, then start reading bytes, or what?
As a disclaimer, I have no experience in working with iphone or NSString, except for what I've read in the documentation in order to answer this question. I'm addressing the question mainly as a unicode problem.
In order to find the first symbol (jamo) from a Korean glyph, you have to perform a decomposition as described in my answer to how to extract characters from a Korean string in VBA (it's a new answer so you didn't see it when you posted your question). To apply my answer (which is derived directly from the Unicode standard), you have to work with the Unicode code points (numerical values) of the Korean syllables. It looks like calling the method dataUsingEncoding
passing NSUnicodeStringEncoding
as a parameter should do the trick.
In order to identify single symbols, you have to check whether the Unicode code point of the character you are checking is in any of the following ranges:
- 1100-11FF (Hangul Jamo). I think this should cover most of the real life cases.
- A960-A97F (Hangul Jamo Extended-A)
- D7B0-D7FF (Hangul Jamo Extended-B)
- 3130-318F (Hangul Compatibility Jamo)
- FFA0-FFDC (Halfwidth Jamo)
Check the Unicode Code Charts for a complete reference.
精彩评论