开发者

std::string.resize() and std::string.length()

I'm relatively new to C++ and I'm still getting to grips with the C++ Standard Library. To help transition from C, I want to format a std::string using printf-style formatters. I realise stringstream is a more type-safe approach, but I find myself finding printf-style much easier to read and deal with (at least, for the time being). This is my function:


using namespace std;

string formatStdString(const string &format, ...)
{
    va_list va;
    string output;
    size_t needed;
    size_t used;

    va_start(va, format);
    needed = vsnprintf(&output[0], 0, format.c_str(), va);
    output.resize(needed + 1); // for null terminator??
    va_end(va);    

    va_start(va, format);
    used开发者_如何学Go = vsnprintf(&output[0], output.capacity(), format.c_str(), va);
    // assert(used == needed);
    va_end(va);

    return output;
}

This works, kinda. A few things that I am not sure about are:

  1. Do I need to make room for a null terminator, or is this unnecessary?
  2. Is capacity() the right function to call here? I keep thinking length() would return 0 since the first character in the string is a '\0'.

Occasionally while writing this string's contents to a socket (using its c_str() and length()), I have null bytes popping up on the receiving end, which is causing a bit of grief, but they seem to appear inconsistently. If I don't use this function at all, no null bytes appear.


With the current standard (the upcomming standard differs here) there is no guarantee that the internal memory buffer managed by the std::string will be contiguous, or that the .c_str() method returns a pointer to the internal data representation (the implementation is allowed to generate a contiguous read-only block for that operation and return a pointer into it. A pointer to the actual internal data can be retrieved with the .data() member method, but note that it also returns a constant pointer: i.e. it is not intended for you to modify the contents. The buffer return by .data() it is not necessarily null terminated, the implementation only needs to guarantee the null termination when c_str() is called, so even in implementations where .data() and .c_str() are called, the implementation can add the \0 to the end of the buffer when the latter is called.

The standard intended to allow rope implementations, so in principle it is unsafe to do what you are trying, and from the point of view of the standard you should use an intermediate std::vector (guaranteed contiguity, and there is a guarantee that &myvector[0] is a pointer to the first allocated block of the real buffer).

In all implementations I know of, the internal memory handled by std::string is actually a contiguous buffer and using .data() is undefined behavior (writting to a constant variable) but even if incorrect it might work (I would avoid it). You should use other libraries that are designed for this purpose, like boost::format.

About the null termination. If you finally decide to follow the path of the undefined... you would need to allocate extra space for the null terminator, since the library will write it into the buffer. Now, the problem is that unlike C-style strings, std::strings can hold null pointers internally, so you will have to resize the string down to fit the largest contiguous block of memory from the beginning that contains no \0. That is probably the issue you are finding with spurious null characters. This means that the bad approach of using vsnprintf(or the family) has to be followed by str.resize( strlen( str.c_str() ) ) to discard all contents of the string after the first \0.

Overall, I would advice against this approach, and insist in either getting used to the C++ way of formatting, using third party libraries (boost is third party, but it is also the most standard non-standard library), using vectors or managing memory like in C... but that last option should be avoided like the plague.

// A safe way in C++ of using vsnprintf:
std::vector<char> tmp( 1000 ); // expected maximum size
vsnprintf( &tmp[0], tmp.size(), "Hi %s", name.c_str() ); // assuming name to be a string
std::string salute( &tmp[0] );


Use boost::format, if you prefer printf() over streams.

Edit: Just to make this clear, actually I fully agree with Alan, who said you should use streams.


I think that there are no guarantees that the layout of the string as referenced by &output[0] is contiguous and that you can write to it.

Use std::vector instead as a buffer which is guaranteed to have contiguous storage since C++03.

using namespace std;

string formatStdString(const string &format, ...)
{
    va_list va;
    vector<string::value_type> output(1); // ensure some storage is allocated
    size_t needed;
    size_t used;

    va_start(va, format);
    needed = vsnprintf(&output[0], 0, format.c_str(), va);
    output.resize(needed); // don't need null terminator
    va_end(va);    

    // Here we should ensure that needed != 0
    va_start(va, format);
    used = vsnprintf(&output[0], output.size(), format.c_str(), va); // use size()
    // assert(used == needed);
    va_end(va);

    return string(output.begin(), output.end());
}

NOTE: You'll have to set an initial size to the vector as the statement &output[0] can otherwise attempt to reference a non-existing item (as the internal buffer might not have been allocated yet).


1) You do not need to make space for the null terminator.
2) capacity() tells you how much space the string has reserved internally. length() tells you the length of the string. You probably don't want capacity()


The std::string class takes care of the null terminator for you.

However, as pointed out, since you're using vnsprintf to the raw underying string buffer (C anachronisms die hard...), you will have to ensure there is room for the null terminator.


My implementation for variable argument lists for functions is like this:

std::string format(const char *fmt, ...)
{
  using std::string;
  using std::vector;

  string retStr("");

  if (NULL != fmt)
  {
     va_list marker = NULL;

     // initialize variable arguments
     va_start(marker, fmt);

     // Get formatted string length adding one for NULL
     size_t len = _vscprintf(fmt, marker) + 1;

     // Create a char vector to hold the formatted string.
     vector<char> buffer(len, '\0');
     int nWritten = _vsnprintf_s(&buffer[0], buffer.size(), len, fmt,
marker);

     if (nWritten > 0)
     {
        retStr = &buffer[0];
     }

     // Reset variable arguments
     va_end(marker);
  }

  return retStr;
}


To help transition from C, I want to format a std::string using printf-style formatters.

Just don't :(

If you do this, you're not actually learning C++ but coding C with a C++ compiler. It's a bad mindset, bad practice, and it propagates the problems that the std::o*stream classes were created to avoid.

I realise stringstream is a more type-safe approach, but I find myself finding printf-style much easier to read and deal with (at least, for the time being).

It's not a more typesafe approach. It is a typesafe approach. More than that, it minimizes dependencies, it lowers the number of issues you have to keep track of (like explicit buffer allocation and keeping track of the null char terminator) and it makes it easier to maintain your code.

Above that it is completely extensible / customizable:

  • you can extend locale formatting

  • you can define the i/o operations for custom data types

  • you can add new types of output formatting

  • you can add new buffer i/o types (making for example std::clog write to a window)

  • you can plug in different error handling policies.

std::o*stream family of classes is very powerful and once you learn to use it correctly there's little doubt you will not go back.

Unless you have very specific requirements your time will probably be much better spent learning the o*stream classes than writing printf in C++.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜