What to watch out for when converting a std::string to a char* for C function?
I have read many posts asking the question on how to convert a C++ std::string
or const std::string&
to a char*
to pass it to a C function and it seems there is quite a few caveat's in regards to doing this. One has to beware about the string being contiguous and a lot of other things. The point is that I've never really understood all the points one needs to be aware of and why?
I wondered if someone could sum up the caveats and downfalls about doing a conversion from a std::string
to a char*
that is needed to pass to a C function?
This when the std::string
is a const
reference and w开发者_开发技巧hen it's just a non-const reference, and when the C function will alter the char*
and when it will not alter it.
First, whether const reference or value doesn't change anything.
You then have to consider what the function is expecting. There
are different things which a function can do with a char*
or
a char const*
---the original versions of memcpy
, for
example, used these types, and it's possible that there is still
such code around. It is, hopefully, rare, and in the following,
I will assume that the char*
in the C function refer to '\0'
terminated strings.
If the C function takes a char const*
, you can pass it the
results of std::string::c_str()
; if it takes a char*
, it
depends. If it takes a char*
simply because it dates from the
pre-const
days of C, and in fact, it modifies nothing,
std::string::c_str()
followed by a const_cast
is
appropriate. If the C function is using the char*
as an out
parameter, however, things become more difficult. I personally
prefer declaring a char[]
buffer, passing this, and then
converting the results to std::string
, but all known
implementations of std::string
use a contiguous buffer, and
the next version of the standard will require it, so correctly
dimensioning the std::string
first (using
std::string::resize()
, then passing &s[0]
, and afterwards
redimensionning the string to the resulting length (determined
using strlen(s.c_str())
, if necessary) can also be used.
Finally (but this is also an issue for C programs using
char[]
), you have to consider any lifetime issues. Most
functions taking char*
or char const*
simply use the
pointer, and forget it, but if the function saves the pointer
somewhere, for later use, the string object must live at least
as long, and its size should not be modified during that period.
(Again, in such cases, I prefer using a char[]
.)
Basically, there are three points that are important:
According to the still current standard,
std::string
isn’t actually guaranteed to use contiguous storage (as far as I know this is due to change). But in fact, all current implementations probably use contiguous storage anyway. For that reason,c_str()
(anddata()
) may actually create a copy of the string internally …The pointer returned by
c_str()
(anddata()
) is valid only as long as no non-const methods on the original string are invoked. This makes its use unsuitable when the C function hangs on to the pointer (as opposed to only using it during the duration of the actual function call).If there is any chance at all that the string is going to be modified, casting away constness from the
c_str()
is not a good idea. You must create a buffer with a copy of the string, and pass that into the C function. If you create a buffer, remember to add a null termination.
[I would add a comment, but I don't have enough rep for that, so sorry for adding (yet) another answer.]
While it is true that the current standard does not guarantee the internal buffer of std::string to be contiguous, it appears that practically all implementations use contiguous buffers. Furthermore, the new C++0x standard (which is about to be approved by ISO) requires contiguous internal buffers in std::string, and even the current C++03 standard requires returning a contiguous buffer when you call data() or &str[0] (though it won't be necessarily null-terminated). See here for more details.
That still doesn't make it safe to write to the string though, since the standard doesn't force implementations to actually return their internal buffer when you call data(), c_str() or operator, and neither are they prevented from using optimizations like copy-on-write, which may complicate things further (it appears that the new C++0x will ban ban copy-on-write though). That being said, if you don't care about maximum portability, you can check your target implementation and see what it actually does inside. AFAIK, Visual C++ 2008/2010 always returns the real internal buffer pointer, and doesn't do copy-on-write (it does have the Small String Optimization, but that's probably not a concern).
When the C function does not alter the string behind the char*
, you can use std::string::c_str()
for both const and non-const std::string
instances. Ideally it would be a const char*
, but if it's not (because of a legacy API) you may legally use a const_cast
.
But you may only use the pointer from c_str()
as long as you're not modifying the string!
When the C function does alter the string behind the char*
, your only safe and portable way to use the std::string
is to copy it to a temporary buffer yourself (for example from c_str()
)! Make sure you free the temporary memory afterwards -- or use std::vector
, which is guaranteed to have continuous memory.
std:string can store zero bytes. This means that when passed to C function it can be truncated prematurely, as C functions will stop on first zero byte. This can have security implications, if you try to use C function for example to filter out or escape unwanted characters.
A result of std::string::c_str() will sometimes be invalidated by operations changing a string (non-const member functions). It will cause very hard to diagnose bugs ("Heisenbugs") if you try to use this pointer after you first use c_str() and then modify a string.
Do not use
const_cast
, ever.goto
is less troublesome.
精彩评论