开发者

Advance a UTF-8 character to the next

I want to change a UTF-8 character (which is in a gchar array), so it gets the value of the next character according to the standa开发者_如何转开发rd. I'm using glib and I don't see a function like that. I'm thinking of a possible solution, but it would take maybe more effort and surely it wouldn't be the most efficient, as I don't know too much about encodings. Is there any library that can do that? Googling didn't help.


This is essentially just add-and-carry modulo 64. Consider the bytes of the character as "digits". You increment the last byte, and if it overflows, reset it to the smallest possible value, and increment the second-to-last byte.

For example, a simple increment:

e0 b0 be -> e0 b0 bf

An increment with single carry:

e0 b0 bf -> e0 b1 80

And an increment with double carry:

e0 bf bf -> e1 80 80

When you increment past the last character of a given size, you'll need to go to the first character of the next size, which of course can't be done in-place in the middle of a string.


If you want to avoid direct byte-hacking, you could do something like this (untested):

gunichar c;
int len, old_len;
char buf[6];

c = g_utf8_get_char(s);
old_len = g_unichar_to_utf8(c, NULL);
c += 1;
len = g_unichar_to_utf8(c, buf);
if (len == old_len) {
  memcpy(s, buf, len);
} else {
  /* something more complex adjusting s length */
}

Of course writing it manually would give you more optimized code. A minor optimization to the above might use g_utf8_next_char() to get the next string position, and compute the old_len from that, instead of independently computing old_len.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜