Why is strtok() Considered Unsafe?
What feature(s) of strtok
is unsafe (in terms of buffer overflow) that I need to watch out for?
What's a little weird to me is that strtok_s
(which is "safe") in Visual C++ has an extra "context" parameter, but开发者_Python百科 it looks like it's the same in other ways... is it the same, or is it actually different?
According with the strtok_s section of this document:
6.7.3.1 The strtok_s function The strtok_s function fixes two problems in the strtok function:
- A new parameter, s1max, prevents strtok_s from storing outside of the string being tokenized. (The string being divided into tokens is both an input and output of the function since strtok_s stores null characters into the string.)
- A new parameter, ptr, eliminates the static internal state that prevents strtok from being re-entrant (Subclause 1.1.12). (The ISO/IEC 9899 function wcstok and the ISO/IEC 9945 (POSIX) function strtok_r fix this problem identically.)
There is nothing unsafe about it. You just need to understand how it works and how to use it. After you write your code and unit test, it only takes a couple of extra minutes to re-run the unit test with valgrind to make sure you are operating withing memory bounds. The man page says it all:
BUGS
Be cautious when using these functions. If you do use them, note that:
- These functions modify their first argument.
- These functions cannot be used on constant strings.
- The identity of the delimiting character is lost.
- The
strtok()
function uses a static buffer while parsing, so it's not thread safe. Usestrtok_r()
if this matters to you.
strtok is safe in Visual C++ (but nowhere else), as it uses thread local storage to save its state between calls. Everywhere else, global variable is used to save strtok() state.
However even in VC++, where strtok is thread-safe it is still still a bit weird - you cannot use strtok()s on different strings in the same thread at the same time. For example this would not work well:
token = strtok( string, seps );
while(token)
{
printf("token=%s\n", token)
token2 = strtok(string2, seps);
while(token2)
{
printf("token2=%s", token2);
token2 = strtok( NULL, seps );
}
token = strtok( NULL, seps );
}
The reason why it would not work well- for every thread only single state can be saved in thread local storage, and here one would need 2 states - for the first string and for the second string. So while strtok is thread-safe with VC++, it is not reentrant.
What strtok_s (or strtok_r everywhere else) provides - an explicit state, and with that strtok becomes reentrant.
If you do not have a properly null terminated string; you will end up in a buffer overflow. Also note (this is something that I learned the hard way) strtok does NOT seem to care about internal strings. I.E. having "hello"/"world" will parse "hello"/"world" whereas "hello/world" will parse into "hello world". Notice that it splits on the / and ignores the fact that it is within a parenthesis.
精彩评论