Parsing a string with varying number of whitespace characters in C
I'm pretty new to C, and trying to write a function that will parse a string such as:
"This (5 spaces here) is (1 space here) a (2 spaces here) string."
The function header would have a pointer to the string passed in such as:
bool Class::Parse( unsigned char* string )
In the end I'd like to parse each word regardless of the number of spaces between words, and store the words in a dynamic array.
Forgive the silly 开发者_JAVA技巧questions... But what would be the most efficient way to do this if I am iterating over each character? Is that how strings are stored? So if I was to start iterating with:
while ( (*string) != '\0' ) {
--print *string here--
}
Would that be printing out
T
h
i... etc?
Thank you very much for any help you can provide.
from http://www.cplusplus.com/reference/clibrary/cstring/strtok/
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-"); /* split the string on these delimiters into "tokens" */
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-"); /* split the string on these delimiters into "tokens" */
}
return 0;
}
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
First of all, C does not have classes, so in a C program you would probably define your function with a prototype more like one of the following:
char ** my_prog_parse(char * string) {
/* (returns a malloc'd array of pointers into the original string, which has had
* \0 added throughout ) */
char ** my_prog_parse(const char * string) {
/* (returns a malloc'd NULL-terminated array of pointers to malloc'd strings) */
void my_prog_parse(const char * string, char buf, size_t bufsiz,
char ** strings, size_t nstrings)
/* builds a NULL-terminated array of pointers into buf, all memory
provided by caller) */
However, it is perfectly possible to use C-style strings in C++...
You could write your loop as
while (*string) { ... ; string++; }
and it will compile to exactly the same assembler on a modern optimizing compiler. yes, that is a correct way to iterate through a C-style string.
Take a look at the functions strtok
, strchr
, strstr
, and strspn
... one of them may help you build a solution.
I wouldn't do any non-trivial parsing in C, it's too laborious, the language is not suitable for that. But if you mean C++, and it looks like you do, since you wrote Class::Parse, then writing recursive descent parsers is pretty easy, and you don't need to reinvent the wheel. You can take Spirit for example, or AXE, if you compiler supports C++0x. For example, your parser in AXE can be written in few lines:
// assuming you have 0-terminated string
bool Class::Parse(const char* str)
{
auto space = r_lit(' ');
auto string_rule = "This" & r_many(space, 5) & space & 'a' & r_many(space, 2)
& "string" & r_end();
return string_rule(str, str + strlen(str)).matched;
}
精彩评论