Designing a string class in C++
I need to design (and code) a "customized" string class in C++. I am seek开发者_开发知识库ing any documentation and pointers on design issues or potential pitfalls I should be aware of.
Links are very welcome, as are the identification of problems (if any) with current string libs (Qstring, std::string, and the others).
Despite the critics, I think this is a valid question.
The std::string
is not a panacea. It looks like someone took the class from a pure-OO and dumped it in C++, which is probably the case.
Advice 1: Prefer non-member non-friend methods
Now that this is said, in this hour of internationalization, I would certainly advise you to design a class that would support Unicode
. And I do say Unicode
, not UTF-8
or UTF-16
. It's ill-fitting (I think) to devise a class that would contain the data in a given encoding. You can provide methods to then output the information in various formats.
Advice 2: Support Unicode
Then, there is a number of points on the memory allocation schemes:
- Small String Optimization: the class contains pre-allocated space for a few characters (a dozen or two), and thus avoid heap allocation for those
- Copy On Write: the various strings share a buffer so that copy is cheap, when one string needs to modify its content, it copies the buffer if it's not the sole owner --> the issue is that multithreading introduces overhead here and it's been showed that for a general purpose technic this overhead could dwarf the actual copying cost
- Immutability: "new" languages such as
Java
,C#
orPython
use immutable strings. Think of it as a pool of strings, all strings containing "Fooo" will point to the same buffer. Note that these languages support garbage collection, which rather helps here.
I would personally pick the "Small String Optimization" here (though it's not exclusive with the other two), simply because it's simple to implement and should actually benefit you (heap allocation cost, locality of reference issues).
The other two technics are somewhat complex in the face of multi-threading, and such are likely error-prone and unlikely to yield any real benefit unless carefully crafted.
And that brings my last advice:
Advice 3: Don't implement internal locking in an attempt of MultiThreading support
It will slow down the class when used in SingleThreaded context and will not yield as much benefit as you'd think when used in a MultiThreaded one.
Finally, you could perhaps find something suiting your tastes (or get some pointers) by browsing existing code. I don't promise to exhibit "smooth" interfaces though:
- ICU UnicodeString: Unicode support, at least
- std::string: over 100 member methods (counting the various overloads)
- llvm StringRef: note how many algorithms are implemented as member methods :'(
Effective STL by Scott Meyers has some interesting discussion about possible std::string
implementation techniques, though it covers rather advanced issues such as copy-on-write and reference counting.
Depending on what the "customization" is (e.g. a custom allocator), you may be able to do it via a template parameter of the std::basic_string class.
Herb Sutter gives a sample of a custom string class in the GotW #29. You could use it for the start.
From a general-purpose point of view a "new" string class ideally combined the good points of std::string, CString, QString and others. A few points in random order:
- MFC CString supports using it in printf-like functions due to a very specific implementation. If you need or want this feature I recommend buying the book "MFC Internals" by George Sheperd. Although the book is from 1996(!) it's description of how CString is implemented should be worth it. http://www.amazon.com/MFC-Internals-Microsoft-Foundation-Architecture/dp/0201407213/ref=sr_1_1?ie=UTF8&s=books&qid=1283176951&sr=8-1
- Check that your string class plays nicely with all interfaces you'll use it with (iostreams, Windows API, printf*, etc.)
- Don't aim for full unicode support (as in: collation, grapheme clusters, ...) as that will mean your class will never be done, but consider making it a wchar_t class with conversion options.
- Consider making the ctor/function that creates your string objects from char* always take the specific encoding of the character arrays. (Can be helpful in mixed UTF-8 / other character sets environments.)
- Look at the full CString interface and at the full std:string interface and decide what you are going to need and what you can skip.
- Look at QString to see what the other two miss.
- Do not provide implicit conversion to neither char/wchar_t*
- Consider adding convenient conversion functions to/from numeric types.
- Don't write a string class without a full set of detailed Unit Tests!
The world doesn't need another string class. Is this homework? If not, use std::string
.
The problem with std::string is.. that you can't change it. Sometimes you need the basics of a std::string, but disagree with the implementation of your c++ library.
As an example, thread-safe reference counting employed means lots of locking (or at least locked operations). Also, if most of your strings are short (because you know this will be the case), you might want a string class that is optimized for that use-case.
So even if you like the std::string API, or at least have learned to live with it, there is room for 'competing implementations' that are more or less workalikes.
PowerDNS would love to have one, as we currently pass many dns host names around, and a large majority of them would fit in a, say, 25 bytes fixed buffer, which would relieve a lot of new/delete pressure.
精彩评论