Trimming UTF8 buffer

2023-03-08 03:33 问答作者：

I have a buffer with UTF8 data. I need to remove the leading and trailing spaces. Here is the C code which does it (in place) for ASCII buffer:



char *trim(char *s)
{
  while( isspace(*s) )
    memmove( s, s+1, strlen(s) );
  while( *s && isspace(s[strlen(s)-1]) )
    s[strlen(开发者_运维问答s)-1] = 0;
  return s;
}

How to do the same for UTF8 buffer in C/C++?

P.S. Thanks for perfomance tip regarding strlen(). Back to UTF8 specific: what if I need to remove all spaces all together, not only at beginning and at the tail? Also I may need to remove all characters with ASCII code <32. Is any specific here for UTF8 case, like using mbstowcs()?

Do you want to remove all of the various Unicode spaces too, or just ASCII spaces? In the latter case you don't need to modify the code at all.

In any case, the method you're using that repeatedly calls strlen is extremely inefficient. It turns a simple O(n) operation into at least O(n^2).

Edit: Here's some code for your updated problem, assuming you only want to strip ASCII spaces and control characters:

unsigned char *in, *out;
for (out = in; *in; in++) if (*in > 32) *out++ = *in;
*out = 0;

strlen() scans to the end of the string, so calling it multiple times, as in your code, is very inefficient.

Try looking for the first non-space and the last non-space and then memmove the substring:

char *trim(char *s)
{
  char *first;
  char *last;

  first = s;
  while(isspace(*first))
    ++first;

  last = first + strlen(first) - 1;
  while(last > first && isspace(*last))
    --last;

  memmove(s, first, last - first + 1);
  s[last - first + 1] = '\0';

  return s;
}

Also remember that the code modifies its argument.

继续阅读：c

Trimming UTF8 buffer

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？