开发者

Modifying underlying char array of a c++ string object

My code is like this:

string s = "abc";
char* pc = const_cast<char*>( s.c_str() );
pc[ 1 ] = 'x';
cout << s <<开发者_如何转开发 endl;

When I compiled the snippet above using GCC, I got the result "axc" as expected. My question is, is that safe and portable to modify the underlying char array of a C++ string in this way? Or there might be alternative approaches to manipulate string's data directly?

FYI, my intention is to write some pure C functions that could be called both by C and C++, therefore, they can only accept char* as arguments. From char* to string, I know there is copying involved, the penalty is unfavorable. So, could anybody give some suggestions to deal with this sort of situation.


To the first part, c_str() returns const char* and it means what it says. All the const_cast achieves in this case is that your undefined behavior compiles.

To the second part, in C++0x std::string is guaranteed to have contiguous storage, just like std::vector in C++03. Therefore you could use &s[0] to get a char* to pass to your functions, as long as the string isn't empty. In practice, all string implementations currently in active development already have contiguous storage: there was a straw poll at a standard committee meeting and nobody offered a counter-example. So you can use this feature now if you like.

However, std::string uses a fundamentally different string format from C-style strings, namely it's data+length rather than nul-terminated. If you modify the string data from your C functions, then you can't change the length of the string and you can't be sure there's a nul byte at the end without c_str(). And std::string can contain embedded nuls which are part of the data, so even if you did find a nul, without knowing the length you still don't know that you've found the end of the string. You're very limited what you can do in functions that will operate correctly on both different kinds of data.


(a) This is not necessarily the underlying string. std::string::c_str() should be a copy of the underlying string (though a bug in the C++ Standard means that, actually, it's often not... I believe that this is fixed in C++0x).

(b) const_casting away the constness only hacks the variable type: the actual object is still const, and your modifying it is Undefined Behaviour — very bad.

Simply speaking, do not do this.


Can you use &myString[0] at all? It has a non-const version; then again, it's stated to be the same as data()[0] which has no non-const version. Someone with a decent library reference to hand can clear this up.


The obvious answer is no, it's undefined behavior. On the other hand, if you do:

char* pc = &s[0];

you can access the underlying data, in practice today, and guaranteed in C++11.


As others said, it is not portable. But there are more dangers. Some std::string implementations (I know that GCC does it) use COW (copy on write).

#include <iostream>
#include <string>

int main()
{

    std::string x("abc");
    std::string y;
    y = x; // x and y share the same buffer

    std::cout << (void*)&x[0] << '\n';
    std::cout << (void*)&y[0] << '\n';

    x[0] = 'A'; // COW triggered

    // x and y no longer share the same buffer
    std::cout << (void*)&x[0] << '\n';
    std::cout << (void*)&y[0] << '\n';

    return 0;
}


This is relying on undefined behaviour, and is therefore not portable.


This would depend on your operating system. In GNU libc library, std::string is implemented using a copy-on-write (CoW) pattern. Thus, if multiple std::string objects initially contain the same content, they will internally all point to the same data. Thus, if you modify any of them in the method you show in your question, the content of all of the (seemingly) unrelated std::string objects will change.

On Windows, I think the implementation doesn't use CoW, I'm not sure what would happen there.

Anyway, it's undefined behavior, so I'd stay clear of it. Chances are, even if you get it working, you'll eventually start running into very hard-to-trace bugs.


You should not mess with the underlying string. At the end of the day, string is an object, would you mess with any other objects this way?

Have you profiled your code to see if there is a penalty.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜