C string split problem
I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word
, which should split string. According to gdb and valgrind, problem is that function sometimes returns invalid pointer, and program fails when trying to free that pointer.
Here is the code:
char **get_word(char *str) {
char **res;
char *token, *copy;
int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
if(str[i] == ' ') {
while(str[i] == ' ') {
i++;
开发者_Python百科 }
size++;
}
}
res = malloc((size + 1) * sizeof(char *));
copy = strdup(str);
token = strtok(copy, " ");
for(int i = 0; token != NULL; i++) {
res[i] = strdup(token);
token = strtok(NULL, " ");
}
free(copy);
res[size] = NULL;
return res;
}
One problem I see is with your nested loops:
Consider this input: ' \0'
The function execution reaches the for
loop, i == 0
. Then the while
loop is also entered. At the end of while
loop i == 1
. Now the incrementation statement from the for
loop is executed and i == 2
. So next you will be reading past the end of the string.
EDIT
I understand that size
is the number of words found in the input. So I'd go for something like:
for (int i = 0; str[i] != '\0'; ++i) {
if (str[i] != ' ' && (str[i + 1] == ' ' || str[i + 1] == '\0')) {
// Counting endings of words
size++;
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **split (const char *str)
{
char **arr = NULL;
size_t cnt=0;
size_t pos, len;
arr = malloc (sizeof *arr);
for (pos = 0; str[pos]; pos += len) {
char *dup;
len = strspn(str+pos, " \t\r\n" );
if (len) continue;
len = strcspn(str+pos, " \t\r\n" );
if (!len) break;
arr = realloc (arr, (2+cnt) * sizeof *arr);
dup = malloc (1+len);
memcpy (dup, str+pos, len);
dup [len] = 0;
arr [cnt++] = dup;
}
arr[cnt] = NULL;
return arr;
}
int main(int argc, char **argv)
{
char **zzz;
for( zzz = split( argv[1] ); *zzz; zzz++) {
printf( "->%s\n", *zzz );
}
return 0;
}
The reallocation is a bit clumsy (like in the OP) and improving it is left as an exercise to the reader 8-}
As julkiewicz points out, your nested loops where you count the words could miss the terminating null on str. Additionally, if str
ends with spaces, your current code would count an extra word.
You could replace this section:
int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
if(str[i] == ' ') {
while(str[i] == ' ') {
i++;
}
size++;
}
}
..with something like this:
while (*str == ' ') str++; // skip leading spaces on str
/* count words */
int size = 0;
char *s = str;
do {
if (*s && *s != ' ') {
size++; // non-space group found
while (*s && *s != ' ') s++; // skip to next space
}
while (*s == ' ') s++; // skip spaces after words
} while (*s);
..which counts the starts of groups of non-space characters, rather than groups of spaces, and watches for the terminating null in the inner loops.
You might also consider changing:
for(int i = 0; token != NULL; i++) {
..to:
for(int i = 0; token && i < size; i++) {
..just as a slightly paranoid guard in case strtok
finds more words than you counted (though it shouldn't).
I think gdb may be complaining about the fact that you never check the return value of malloc
(or strdup
) to see if it's non-NULL.
精彩评论