why the performance of strcpy in glibc is worse?

2023-04-03 05:45 问答作者：

I am reading the source code for glibc2.9. Reading the source code for the strcpy function, the performance is not as good as I expect.

The following is the source code of strcpy in glibc2.9:

   char * strcpy (char *dest, const char* src)
    {
        reg_char c;
        char *__unbounded s = (char *__unbounded) CHECK_BOUNDS_LOW (src);
        const ptrdiff_t off = CHECK_BOUNDS_LOW (dest) - s - 1;
        size_t n;

        do {
            c = *s++;
            s[off] = c;
        }
        while (c != '\0');

        n = s - src;
        (void) CHECK_BOUNDS_HIGH (src + n);
        (void) CHECK_BOUNDS_HIGH (dest + n);

        return dest;
    }

Because I don't know the reason for using the offset, I did some performance tests by comparing the above code with the following code:

char* my_strcpy(char *dest, const char *src)
{
    char *d = dest;
    register char c;

    do {
        c = *src++;
        *d++ = c;
    } while ('\0' != c);

    return dest;
}

As a result, the performance of strcpy is worse during my tests. I have removed the codes about bound pointer.

Why does the glibc version use the offsets??

The following is the introduction about the tests.

platform: x86(Intel(R) Pentium(R) 4), gcc version 4.4.2
compile flag: No flags, because I don't want any optimisation; The command is gcc test.c.

The test code I used is the following:

#include <stdio.h>
#include <stdlib.h>

char* my_strcpy1(char *dest, const char *src)
{
    char *d = dest;
    register char c;

    do {
        c = *src++;
        *d++ = c;
    } while ('\0' != c);

    return dest;
}

/* Copy SRC to DEST. */
char *
my_strcpy2 (dest, src)
     char *dest;
     const char *src;
{
  register char c;
  char * s = (char *)src;
  const int off = dest - s - 1;

  do
    {
      c = *s++;
      s[off] = c;
    }
  while (c != '\0');

  return dest;
}

int main()
{
    const char str1[] = "test1";
    const char str2[] = "test2";
    char buf[100];

    int i;
    for (i = 0; i < 10000000; ++i) {
        my_strcpy1(buf, str1);
        my_strcpy1(buf, str2);
    }

    return 0;
}

When using the my_strcpy1 function, the outputs are:

[root@Lnx99 test]#time ./a.out

real    0m0.519s
user    0m0.517s
sys     0m0.001s
[root@Lnx99 test]#time ./a.out

real    0m0.520s
user    0m0.520s
sys     0m0.001s
[root@Lnx99 test]#time ./a.out

real    0m0.519s
user    0m0.516s
sys     0m0.002s

When useing my_strcpy2, the output is:

[root@Lnx99 test]#time ./a.out

real    0m0.647s
user    0m0.647s
sys     0m0.000s
[root@Lnx99 test]#time ./a.out

real    0m0.642s
user    0m0.638s
sys     0m0.001s
[root@Lnx99 test]#time ./a.out

real    0m0.639s
user    0m0.638s
sys     0m0.002s

I know it is not very accurate with the command time. But I could get the answer from the user time.

Update:

To remove the cost used to calculate the offset, I removed some code and added a global variable.

#include <stdio.h>
#include <stdlib.h>

char* my_strcpy1(char *dest, const char *src)
{
    char *d = dest;
    register char c;

    do {
        c = *src++;
        *d++ = c;
    } while ('\0' != c);

    return dest;
}


int off;

/* Copy SRC to DEST. */
char *
my_strcpy2 (dest, src)
     char *dest;
     const char *src;
{
  register char c;
  char * s = (char *)src;

  do
    {
      c = *s++;
      s[off] = c;
    }
  while (c != '\0');

  return dest;
}

int main()
{
    const char str1[] = "test1test1test1test1test1test1test1test1";
    char buf[100];

    off = buf-str1-1;

    int i;
    for (i = 0; i < 10000000; ++i) {
        my_strcpy2(buf, str1);
    }

    return 0;
}

But the performance of my_strcpy2 is still worse than my_strcpy1. Then I checked the assembled code but failed to get the answer too.

I also enlarged the size of string and the performance 开发者_StackOverflow中文版of my_strcpy1 is still better than my_strcpy2

It uses the offset method because this eliminates one increment from the loop - the glibc code only has to increment s, whereas your code has to increment both s and d.

Note that the code you're looking at is the architecture-independent fallback implementation - glibc has overriding assembly implementations for many architectures (eg. the x86-64 strcpy()).

Based on what I'm seeing, I'm not at all surprised that your code is faster.

Look at the loop, both your loop and glibc's loop are virtually identical. But glibc's has a extra code before and after...

In general, simple offsets do not slow down performance because x86 allows a fairly complicated indirect-addressing scheme. So both loops here will probably run at identical speeds.

EDIT: Here's my update with the added info you gave.

Your string size is only 5 characters. Even though the offset method "may" be slightly faster in the long run, the fact that it needs several operations to compute the offset before starting the loop is slowing it down for short strings. Perhaps if you tried larger strings the gap will narrow and possibly vanish altogether.

Here is my own optimization of strcpy. I think it had 2x-3x speedup vs naive implementation, but it need to be benchmarked.

https://codereview.stackexchange.com/questions/30337/x86-strcpy-can-this-be-shortened/30348#30348

继续阅读：c glibc performance

why the performance of strcpy in glibc is worse?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？