How does the string class in c++ std work?
I'm afraid I don't know templates (or C++, really), but I know algorithms and data structures (even some OOP! :). Anyway, to make the question a bit more precise, consider what I would like to be part of the answer (among others I don't know in advance).
- Why is it coded as a template?
- How does the template work?
- How does it do mem allocation?
- Wh开发者_运维技巧y is (is not) better than mere null terminated char arrays?
std::string
is actually atypedef
to astd::basic_string<char>
, and therein lies the answer to your #1 above. Its a template in order to makebasic_string
work with pretty much anything.char
,unsigned char
,wchar_t
,pizza
, whatever...string
itself is just a programmer convenience that useschar
as the datatype, since that's what's often wanted.Unanswerable as asked. If you're confused about something, please try to narrow it down a bit.
There are two answers. One, from the application-layer point of view, all
basic_string
objects use anallocator
object to do the actual allocation. Allocation methods may vary from one implementation to the next, and for different template parameters, but in practice they will usenew
at the lower levels to allocate & manage the contained resource.Its better than mere char arrays for a wide variety of reasons.
string
managers the memory for you. You do not have to ever allocate buffer space when you add or remove data to the string. If you add more than will fit in the currently-allocated buffer,string
will reallocate it for you behind the scenes.In this regard,
string
can be thought of as a kind of smart pointer. For the same reasons why smart pointers are better than raw pointers,string
s are better than raw char arrays.Type safety. This may seem a little convoluted, but
string
used properly has better type safety than char buffers. Consider a common scenario:
#include <string>
#include <sstream>
using namespace std;
int main()
{
const char* jamorkee_raw = "jamorkee";
char raw_buf[0x1000] = {};
sprintf( raw_buf, "This is my string. Hello, %f", jamorkee_raw);
const string jamorkee_str = "jamorkee";
stringstream ss;
ss << "This is my string. Hello " << jamorkee_str;
string s = ss.str();
}
the type safety issue raised in the above by using a raw char buffer isn't even possible when using string
along with streams.
A rather quick (and therefore probably incomplete) shot at answering some of the questions:
- Why is it coded as a template?
Templates provide the capability for the class functions to work on arbitrary data types. For example the basic_string<>
template class can work on char
units (which is what the std::string
typedef does) or wchar_t
units (std::wstring
) or any POD type. Using something other than char
or wchar_t
is unusual (std::vector<>
would more likely be used), but the possibility exists.
- How does it do mem allocation?
This isn't specified by the standard. In fact, the basic_string<>
template allows an arbitrary allocator to be used for the actual allocation of memory (but doesn't determine at what points allocations might be requested). Some implementations might store short strings in actual class members, and only allocate dynamically when the strings grow beyond a certain size. The size requested might be exactly what's need to store the string or might be a multiple of the size to allow for growth without a reallocation.
Additional information stolen from another SO answer:
Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in string implementations".
He talks about 4 variations:
several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.
a "short string optimization" implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer
- Why is (is not) better than mere null terminated char arrays?
One way the string
class is better than a mere null terminated array is that the class manages the memory required, so defects involving allocation errors or overrunning the end of the allocated arrays are reduced. Another (perhaps minor) benefit is that you can store 'null' characters in the string. A drawback is that there's perhaps some overhead - especially that you pretty much have to rely on dynamic memory allocation for the string class. In most scenarios that's probably not a major issue, on some setups (embedded systems for example) it can be a problem.
string
is not the template,string
is a specialization of thebasic_string
class template forchar
. It's a template so that for example you can typedefwstring
which specializes on wide characters, and use all the same code for the encapsulated value.See @Gman's comment. Compile-time code reuse, while retaining the ability to selectively special-case, is the basic rationale for templates.
Implementation dependent. Some do single-instance allocation, with copy on write. Some use a builtin buffer for small strings and allocate from heap only after a certain size is reached. I suggest you investigate how it works on your compiler by walking the constructor and follow-on code in
<string>
, as that will help you understand 2. hands on, which is way more valuable than just reading about it (though a book or other reading is a great idea for intro to templates).Because
const char*
and the CRT that supports it is a bug farm for the unwary. Check out all the stuff you get for free with std::string. Plus a whole bunch of Standard C++ algorithms that work withstring
iterators.
Why is it coded as a template?
Several people have given the answer that having std::basic_string
be a template means that you can have both std::basic_string<char>
and std::basic_string<wchar_t>
. What nobody has explained is why C and C++ have multiple character types in the first place.
C, especially in its early versions, was minimalistic about data types. Why have bool
when the integers 0 and 1 work just fine? And why have distinct types for "byte" and "character" when they're both 8 bits?
The problem is that 8 bits limits you to 256 characters, which is adequate for an alphabetic language like English or Russian, but nowhere near enough for Japanese or Chinese. And now we have Unicode with its 21-bit code points. But char
couldn't be expanded to 16 or 32 bits because the assumption that char
= byte was so entrenched. So we got a separate type for "wide characters".
But now we have the problem that wchar_t
is UTF-32 on Linux but UTF-16 on Windows. And to solve that problem the next version of the C++ standard will add the char16_t
and char32_t
types (and corresponding string types).
A good free online resource is "Thinking in C++" by Bruce Eckel, whose site is here: http://mindview.net/Books/TICPP/ThinkingInCPP2e.html .
The second volume of his free book is mirrored here: http://www.smart2help.com/e-books/ticpp-2nd-ed-vol-two/#_ftnref14 . Chapter three is all about the string class, why it's a template, and why it's useful.
精彩评论