开发者

C programming's (K&R 1-22) fold input problem

I'm a Delphi, Ruby, and Javascript programmer that is finally learning C - starting with K&R. I'm trying my best not to jump ahead and use libraries and concepts not yet introduced. Since this is the chapter one tutorial I'm stick with just a few language features and would like to keep it that way.

1-22 for the unfamiliar:

Write a program to ``fold'' long input lines into two or more shorter lines after the last non-blank character that occurs before the n-th column of input.

Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.

I've made it to 1-22 without seeking outside help, but I've been wrestling with 'mostly' working versions of 1-22. I think my algorithmic..erm..choice stinks.

Thus far I've decided to fold input at 40 character. Using integer division (/ and modulus %) I figure out how many times i need to fold each line, jump to that column and count backwards until I hit a space. Space is replaced by '\n'. Repeat +40 characters out.

If no spaces are encounters we'll kick in a hard fold at each column stop.

I'm getting some lines sneaking passed my boundary and wonder if I shouldn't instead read the input into char line[] and then copy to a buffer 40 chars at a time, fold in the buffer, and copy the buffer back to line[]..but that seems like a ton work especially without string.h

Code is below, I'm looking for hints in the right direction vs solutions as I think I'm nearly there.

#include <stdio.h>

#define MAXBUF 1000
#define WRAP 20

int getline(char s[],int lim);

int main(void)
{
    int len;                /* length of each input */
    int folds;              /* how many folds we've performed */
    int lines;              /* lines the input breaks down to given len */
    int index;              /* index of fold */
    int didfold;            /* true (1) if we were able to fold on a ' ' */
    int i;                  /* loop counter */
    char line[MAXBUF+1];    /* input line */
    char buf[MAXBUF+1];     /* temp buffer for copying */

    while ((len=getline(line,MAXBUF)) > 0)
    {
        /* how many times should we fold the input
           account for left overs
        */
        lines = len / WRAP;
        if (len % WRAP != 0)
            ++lines;
        /* init */
        folds = 1;

        while (lines>0)
        {
            didfold = 0;
            for (index=(WRAP*folds)-1;index>0 && !didfold;--index)
            {
                if (line[index] == ' ')
                {
                    line[index] = '\n';
                    didfold = 1;
                    --lines;
                    ++folds;
                }
            }
            // if (!didfold)
            // {
            //  i = 0;
            //  while ((buf[i] = line[i]) != '\0');
            //      ++i;
            //  for(index=i=0;buf[i]!='\0';++index,++i)
            //  {
            //      line[index] = buf[i];
            //      if (index==(WRAP*folds)) 
            //      {
            //          ++i;
            //          line[i] = '\n';
            //          didfold = 1;
            //          ++folds;
            //          linelength -= WRAP * folds;
            //      }
            //  }
            // }

        }
        printf("--------------------|||||\n");
        printf("%s",line);
    }
    return 0;
}


int getline(char s[],int lim)
{
    int i,c;

    for (i=0;i<=lim && ((c = getchar()) != EOF) && c != '\n'; ++i)
        s[i] = c;
    if (c == '\n') {
        s[i] = '\n';
        ++i;
    }
    s[i] = '\0';

    return i;
}

I have another version that indexes itself to column - 40 and counts forward with even more issues.

UPDATE

The following has bugs I'm working out so I'm not done by a long shot but..

Am I headed in the right direction? I want to make sure I grasp the classic UNIX text filter. So far this code feels better but still hack'ish - I just don't feel like I'm grasping a key concept yet needed to finish this with nice code..

/* Exercise 1-22. Write a program to ``fold'' long input lines into two or more shorter
   lines after the last non-blank character that occurs before the n-th column of input.

   Make sure your program does something intelligent with very long lines, and if there
   are no blanks or tabs before the specified column. */

#include <stdio.h>

#define WRAP 20

int main(void)
{
    char buf[WRAP+1];
    int bufpos = 0;
    int last_whitespace = -1;

    for(bufpos=0;bufpos<(WRAP-1);++bufpos) {
        putchar('-');
    }
    putchar('|');
    putchar('\n');
    bufpos=0;

    while ((buf[bufpos]=getchar())!=EOF) {
        // if at buffer 开发者_StackOverflowor newline
        if (bufpos==(WRAP-1) || buf[bufpos] == '\n' || buf[bufpos] == '\t') {
            ++bufpos;
            buf[bufpos] = '\0';

            if (buf[bufpos]==' ' || buf[bufpos] == '\n') {
                // whitespace, flush buf and go.
                printf("%s",buf);
            } else {
                if (last_whitespace>0) {
                    buf[last_whitespace] = '\n';
                    printf("%s",buf);
                } else {
                    //hard fold!
                    printf("%s",buf);
                    putchar('\n');
                }
            }           
            for (bufpos=0;bufpos<WRAP;++bufpos)
                buf[bufpos] = '\0';
            bufpos=0;
            last_whitespace=-1;
        } else {
            if (buf[bufpos]==' ')
                last_whitespace = bufpos;
            ++bufpos;   
        }
    }
    return 0;
}


Read through each line, character-by-character, and maintain a few pointers (or, if you're not using pointers yet, offsets). One for the "beginning of the line", which will start off pointing at the beginning of the line, one for the last whitespace you saw, which will start off NULL (if you're using offsets, -1 will do instead), and one for the current reading position.

Then, whenever you reach some whitespace, you should check to see whether you can output everything from the previous whitespace up to (but not including) the current whitespace without going beyond WRAP characters. If you can, then output it right away, and update the previous whitespace pointer to point to the current whitespace. If you can't, then output a newline instead of the previous whitespace, and update the beginning-of-line pointer as well as the last-whitespace pointer.

Now the only thing left is to "handle really long lines", which you can do by seeing if the beginning of the line and the last whitespace seen are both in the same place, but we're still beyond WRAP columns — that means that we have a word that won't fit on one line. In that case, We should probably insert a linebreak right here, reset the beginning-of-line, and keep going.

Make sure that you also print anything that hasn't been printed yet when you reach the end of the input line, as well as outputting a final newline.

The nice thing about this algorithm is that it doesn't actually need an array of lines to work — since it processes character-by-character it could work with very little change reading the input file one character at a time, given a buffer of at least WRAP characters.


A thought on why some lines are sneaking past your boundary: When you go to index=(WRAP*folds)-1, this puts you at 19 initially - you then start counting backwards till you hit a space. Let's say this space is at index 17. Insert a newline. You then increment 'folds' and recalculate (index=(WRAP*folds)-1), which is now 39. Let's say index 39 is a space, so you immediately put in a newline. The line between the two newlines you have put in is over 20 characters long!

What I would recommend instead is to initialize index to WRAP-1 right before the for loop, then increment index by WRAP every time you restart the loop (after each newline is created). This would prevent the next starting point for checking and inserting each newline from being more than WRAP spaces past where the last one ended.

index = WRAP-1;
while (lines > 0) { 
    for (didfold=0; index>0 && !didfold; --index)
    {
        if (line[index] == ' ')
        {
            line[index] = '\n'
            didfold = 1;
            --lines;
            index = index+WRAP;
        }
    }
}

I hope this gets the idea across...


I think initially when you are calculating length of your line in getline() method simply count upto EOF without considering spaces .Now once you are in your main method now take spaces and other things into considerations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜