开发者

Has anyone implemented a replacement for string.h using a struct to store the string and length?

In the C standard library, strings are implemented using an array of chars, terminated by a null character: '\0'. Such ASCIZ strings lead to inefficiency because every time we need to know the length of a string, we need to iterate over it looking for '\0'.

The way around this is 开发者_如何学Pythonto store the length of the string when we create it, e.g. using a struct as follows:

typedef struct cstring_ {
    size_t nchars;
    char chars[0];
} cstring;

Has anyone made a shared library implementing the string.h functions, but using a struct instead of char * to pass strings around?

If not, is there a specific reason why this would be a bad idea?


There are probably dozens of those. Have a look at Glib's GString for example.


Has anyone made a shared library implementing the string.h functions, but using a struct instead of char * to pass strings around?

I did.

11 years ago, when I was learning C: I reimplemented the whole <string.h> library, making sure reallocations were used whenever more room was needed in the string.

But then, it was for learning purposes (since, then, I moved to C++ and now use std::string).

is there a specific reason why this would be a bad idea?

I guess it can be a good idea to try it yourself: This way, using the right API, you can memorize along the string both its length, the size of the buffer, perhaps even a reference counter if you want to try playing with copy-on-write concepts. Your string will be more complex, but more efficient for some cases than the default. And this is a good learning experience.

But for production code, as always, either you are very very experienced, or you should try to find a library that will do that better than you will.

I know some production-ready implementations using this alternative string.

Mat already mentioned the GLib's GString.

If you're coding for Windows, Microsoft's BSTR (and its C++ wrapper bstr_t) could solve your problem: They are can be read like a const char * string, and they use SysAllocString and its sister functions, SysFreeString, etc..

You can use them for production code, or for learning purposes, learn from them.


From the C FAQ

Despite its popularity, the technique is also somewhat notorious: Dennis Ritchie has called it ``unwarranted chumminess with the C implementation,'' and an official interpretation has deemed that it is not strictly conforming with the C Standard, although it does seem to work under all known implementations. (Compilers which check array bounds carefully might issue warnings.)

Also I think it should be char chars[1];.


Yes, there's a bunch of libraries that do this, including Glib, BString, VStr and others. The problems is that they're generally quite awkward to use, or at the least require learning non-standard APIs to handle strings. (C++'s std::string would be an example of string handling done right, but it depends on a lot of C++ features.)

If you're afraid of the cost of strlen, then you should compute the length of the string "manually" while doing operations on them and perform most operations with memcpy and direct access to the characters. That's only useful in tight loops, though.


I implemented something like this in one of my projects (however, I used class instead of struct). It is easy to implement. Also it is good idea to store everything, including length, in one memory area, and represent a string as a pointer to the the beginning of string data itself.


I find that whenever I need the length of a "string" what I really need is to know if the string is empty or whether I've reached its end. Other times I need to iterate through the chars anyway so I can just as easily check for NULL.

So, let me rephrase your question: is there a specific reason you think this is a better idea?


I don't think it's a bad idea, actually the c++ implementation of string is just like you said. And there is also c implementation such as gstring in glib. It is almost a standard library in linux world. I think the reason why it's not a standard c lib is because c lang has too long history and most of the developers and projects are used to orignal c style string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜