开发者

Take apart xml text in C

My objective is to read an XML text file and split each word and tag into there own line in an array.

For example, if I input this text into my program:

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

I would get this:

<note>
<to>
Tove
</to>
<from>
...

Right now I have code that can successfully do this but only with the words so instead of the above list I get:

note
to
Tove
...

I want to keep the tags or I wont be able to do what I want with it. So I have been trying to get it to also add the tags but have been failing

Okay so here is my code:

//While the file is not empty
while(fgets(buffer, sizeof(buffer), stdin) != NULL){
    int first = 0;
    int last = 0;

    //While words are left in line
    while(last < INITIAL_SIZE && buffer[last] != '\0'){
        int bool = 0;
        //Tag detected
        if(buffer[last] == '<'){
            while(buffer[last] != '>'){
                last++;
            }

            bool = 1;
        }else{
            //While more chars are in the word
            while(last < INITIAL_SIZE && isalpha(buffer[last])){
                last++;
            }
        }
        //Word detected
        if(first < last){
            //Words array is full, add more space
            if(numOfWords == sizeOfWords){
                sizeOfWords = sizeOfWords + 10;
                words = (char **) realloc(words, sizeOfWords*sizeof(char *));
            }               
            //Allocate memory for array
            words[numOfWords] = (char *) calloc(last-first+1, sizeof(char));


            for(i = 0; i < (last-first); i++){
                words[numOfWords][i] = buffer[first + i];
            }
            //Add terminator to "new word"
            words[numOfWords][i] = '\0';
            numOfWords++;   
        }           
        //Move "Array Pointers" accordingly
            last++;
            first = last;
    }       
}

Any one have any idea, with the above code this is the printout:

<note
<to
Tove
to 
<from
Jani
from
<heading
...
Don
t
forget
me
this
weekend
</body
</note

So after this wall of text, does anyone have any idea on how I can modify my current code to get this to work? Or开发者_如何学C does anyone else have an alternative?


My basic way of thinking is this:

first is the first letter included in the current word;

last is the first letter not included in the current word.

In your program, when you are detecting tags, you are not including the >. Also, the last++ in the end is not needed, since you are parsing the words correctly, once you include the >, it's useless. In addition, you forgot to check not only \0 as the end of a string, but also\n as the end of line.

Here's my solution:

while (fgets(buffer, sizeof(buffer), stdin) != NULL) {
    int first = 0;
    int last = 0;

    //While words are left in line
    while (last < INITIAL_SIZE && buffer[last] != '\0' 
          && buffer[last] != '\n')  { // <--------- Add this
        int Bool = 0;
        //Tag detected
        if (buffer[last] == '<') {
            while (buffer[last] != '>') {
                last++;
            }

            last++; // <--------- This
            Bool = 1;
        } else {
            //While more chars are in the word
            while (last < INITIAL_SIZE && isalpha(buffer[last])) {
                last++;
            }
        }
        //Word detected
        if (first < last) {
            //Words array is full, add more space
            if (numOfWords == sizeOfWords) {
                sizeOfWords = sizeOfWords + 10;
                words = (char **) realloc(words,
                        sizeOfWords * sizeof(char *));
            }
            //Allocate memory for array
            words[numOfWords] = (char *) calloc(last - first + 1,
                    sizeof(char));

            for (i = 0; i < (last - first); i++) {
                words[numOfWords][i] = buffer[first + i];
            }
            //Add terminator to "new word"
            words[numOfWords][i] = '\0';
            numOfWords++;
        }
        //Move "Array Pointers" accordingly
        first = last; // <--------- And change this
    }
}


Even though it is highly doubtful that anyone would ever use this I got it to work by using Boolean type logic.

while (fgets(buffer, sizeof(buffer), stdin) != NULL) {
    int first = 0;
    int last = 0;

    //While words are left in line
    while (last < INITIAL_SIZE && buffer[last] != '\0' && buffer[last] != '\n'){
        int Bool = 0;
        //Tag detected
        if (buffer[last] == '<'){
            while (buffer[last] != '>')
                last++;
            Bool = 1;
        }else
            //While more chars are in the word
            while(last < INITIAL_SIZE && !isspace(buffer[last]) && buffer[last] != '<')
                last++;

        //Word detected
        if (first < last) {
            //Words array is full, add more space
            if (numOfWords == sizeOfWords) {
                sizeOfWords = sizeOfWords + 10;
                words = (char **) realloc(words, sizeOfWords * sizeof(char *));
            }
            //Allocate memory for array
            words[numOfWords] = (char *) calloc(last - first + 1, sizeof(char));

            int xHolder = 0;
            if(buffer[first] == '/'){
                words[numOfWords][0] = '<';
                xHolder++;
                Bool++;
            }
            for (i = 0; i < (last - first + Bool); i++) {
                words[numOfWords][xHolder] = buffer[first + i];
                xHolder++;
            }
            //Add terminator to "new word"
            words[numOfWords][i] = '\0';
            numOfWords++;
        }
        //Move "Array Pointers" accordingly
        last++;
        first = last;
    }
}


The best advice I can give here is what was given to me when I posted this on comp.lang.c.

Functions

Pretty much everywhere you've written a full-line comment, the important words from the comment shoud be the name of the function called at that point.

ProcessFile
    while(fgets..)
        ProcessWords()

ProcessWords
    if(DetectTag)
        ...

Refactoring in this way makes compicated code much easier to read (for you, too). It allows your top-level logic to read like pseudocode, while all the fiddly-bits can be grouped together. Maybe someday, tags will use curly braces. Put your literals in #defines or even enums. That way simple syntax changes can be made easily later on.

The goal is you shoud be able to see the entire function body on the screen at the same time. This allows you to verify each piece separately.


You might be having problem in your inner loop

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜