How to write a better strlen function?
I am reading "Write Great Code Volume 2" and it shows the following strlen impelementation:
int myStrlen( char *s )
{
char *start;
start = s;
while( *s != 0 )
{
++s;
}
return s - start;
}
the book says that this implementation is typical for an inexperienced C programmer. I have been coding in C for the past 11 years and i can't see how to write a function better than this in C(i can think of writing better thing in assembly). How is it possible to 开发者_如何学Cwrite code better than this in C? I looked the standard library implementation of the strlen function in glibc and I couldn't understand most part of it. Where can I find better information on how to write highly optimized code?
From Optimising strlen(), a blogpost by Colm MacCarthaigh:
Unfortunately in C, we’re doomed to an O(n) implementation, best case, but we’re still not done … we can do something about the very size of n.
It gives good example in what direction you can work to speed it up. And another quote from it
Sometimes going really really fast just makes you really really insane.
Victor, take a look at this:
http://en.wikipedia.org/wiki/Strlen#Implementation
P.S. The reason you don't understand the glibc version is probably because it uses bit shifting to find the \0.
For starters, this is worthless for encodings like UTF-8... that is, calculating the number of characters in an UTF-8 string is more complicated, whereas the number of bytes is, of course, just as easy to calculate as in, say, an ASCII string.
In general, you can optimize on some platforms by reading into larger registers. Since the other links posted so far don't have an example of that, here's a bit of pseudo-pseudocode for lower endian:
int size = 0;
int x;
int *caststring = (int *) yourstring;
while (int x = *caststring++) {
if (!(x & 0xff)) /* first byte in this int-sized package is 0 */ return size;
else if (!(x & 0xff00)) /* second byte etc. */ return size+1;
/* rinse and repeat depending on target architecture, i.e. twice more for 32 bit */
size += sizeof (int);
}
As others have pointed out, a faster algorithm reads entire words instead of individual characters and uses bitwise operations to find the terminating null. Be mindful of word-aligning your pointer if you take this approach, as some CPU architectures won't let you read words from an unaligned address (and it's a great way to trigger a segfault even on architectures that don't require alignment).
Bottom line:
Great code emphasizes readability over speed in all but the most performance-critical cases. Write your code as clearly as you can and only optimize the parts that prove to be bottlenecks.
Reading a variable that is not of the same size as the machine data bus size is expensive, because the machine can only read variables of that size. Therefore, whenever something of different size (let's say smaller) is requested, the machine must do work to make it look like a variable of the requested size (like shifting the bits). So you better read the data in machine sized words, and then use the AND operation to check for 0s. Also, when scanning through the string, make sure you start at an aligned start address.
Answering OP's question about where to find suggestions how to write code for performance, here's link to MIT OpenCourse on writing Optimized C Code (look for "Materials" link on the left of page).
The following should be faster than the naive algorithm and work for 32/64 bit.
union intptr {
char* c;
long* l;
#define LSIZE sizeof(long)
};
#define aligned_(x, a) \
((unsigned long) (x) % (a) == 0)
#define punpktt_(x, from, to) \
((to) (-1)/(from) (-1)*(from) (x))
#define punpkbl_(x) \
punpktt_(x, unsigned char, unsigned long)
#define plessbl_(x, y) \
(((x) - punpkbl_(y)) & ~(x) & punpkbl_(0x80))
#define pzerobl_(x) \
plessbl_(x, 1)
static inline unsigned long maskffs_(unsigned long x)
{
unsigned long acc = 0x00010203UL;
if (LSIZE == 8)
acc = ((acc << 16) << 16) | 0x04050607UL;
return ((x & -x) >> 7) * acc >> (LSIZE*8-8);
}
size_t strlen(const char* base)
{
union intptr p = { (char*) base };
unsigned long mask;
for ( ; !aligned_(p.c, LSIZE); p.c++ )
if (*p.c == 0)
return p.c - base;
while ( !(mask = pzerobl_(*p.l)) )
p.l++;
return p.c - base + maskffs_(mask);
}
精彩评论