Line counting and abberant results
I'm writing a utility to count the lines in a given file via the Unix command line. Normally this would be dead simple for me, but apparently I'm having a major off night. The goal of this program is to take in an unknown number of files from the command line, read them into a buffer and check for the newline character. Sounds simple?
int size= 4096;
int main(int argc, char *argv[]){
int fd, i, j, c, fileLines, totalLines;
char *buf= (char *)malloc(size); //read buffer
for (i=2; i<argc; i++){ //get first file
fileLines=1;
if ((fd=open(argv[i], O_RDONLY))!= -1){ //open, read, print file count, close
while ((c= read(fd, buf, size))!= 0){
for (j=0; j<size; j++){
if (buf[j] == '\n')
fileLines++;
}
}
}
printf("%s had %d lines of text\n", argv[i], fileLines);
totalLines+= fileLines;
close(fd);
}
printf("%d lines were counted overall\n", totalLines);
return 0;
}
I have two problems. The first is that the first printf statement is never executed outside of the debugger. The second thing is the totalLines printout should be roughly 175K lines, but the printed value is about 767 times larger.
I'm having trouble understanding this, because all the relevant variables have been declared out of scope from their modification, but that still doesn't explain why the first print statemeent and line counter update is ignored ou开发者_运维百科tside of the debugger along with the abberant totalLines result
Any help is appreciated.
ANSWER
Two changes were suggested.
The first was to changej<size
to j<c
. While this was not the solution required, it follows good coding convention
The second was to change i=2
to i=1
. The reason I had the original start variable was the way I started the debugger executable. In the gdb command line, I entered in run lc1 f1.txt
to start the debugger. This resulted in the arglist having three variables, and I didn't know that run f1.txt
was perfectly suitable, since my professor introduced us to gdb by using the first example.
You're not initializing totalLines
. You increment it inside of your loop, but you don't set it to 0 when you first declare it.
Also, why do you start from i=2
? This is the third command-line argument, and the second parameter to your program. Is this what you intended, or did you want to start from the first parameter to your program?
And as others have pointed out, you should have j < c
instead of j < size
.
Your loop is wrong. It should be j=0; j<c; j++
. That's probably not directly responsible for the errors you're seeing but will definitely cause problems.
Did you try stepping through the code with a debugger?
Consider: ./program file.txt
argv[0] is "program"
argv[1] is "file.txt"
which means your for
loop starts from the wrong index, and if you are passing only 1 file through the cmd line your code will never enter in that loop! It should start at index 1:
for (i=1; i<argc; i++){
Do yourself a favor and initialize all variables when you declare them. Is the only way to ensure that there will be no garbage on those memory locations.
First, excellent question. :) All the necessary code, well stated, and it's obvious you've done your work. :)
How are you starting your program when in the debugger? I think the argv[2]
starting point might be related to not reaching the printf()
, but it would depend upon how you're starting. More details below.
A few comments:
int size= 4096;
Typically, C preprocessor macros are used for this kind of magic number. I know your teachers probably said to never use the preprocessor, but idiomatic C would read:
#define SIZE 4096
for (i=2; i<argc; i++){ //get first file
Try i=1
-- argv[0]
is the name of the program, argv[1]
is going to be the first command line argument -- presumably if someone calls it via ./wc foo
you want to count the number of lines in the file foo
. :) (Also, you want the loop to terminate. :) Of course, if you're trying to write a replacement for wc -l
, then your loop is alright, but not very helpful if someone screws up the arguments. That can safely be kept as a project for later. (If you're curious now, read the getopt(3)
manpage. :)
if ((fd=open(argv[i], O_RDONLY))!= -1){
while ((c= read(fd, buf, size))!= 0){
for (j=0; j<size; j++){
You are ending the loop at j<size
-- but you only read in c
characters in the last block. You're reading left-over garbage on the last block. (I wouldn't be surprised if there are generated files in /proc/
that might return short reads out of convenience for kernel programmers.)
if (buf[j] == '\n')
fileLines++;
}
}
}
printf("%s had %d lines of text\n", argv[i], fileLines);
totalLines+= fileLines;
This is the first time you've assigned to totalLines
. :) It is liable to have garbage initial value.
close(fd);
You should probably move the close(fd);
call into the if((fd=open()))
block; if the open failed, this will call close(-1);
. Not a big deal, but if you were checking the close(2)
error return (always good practice), it'd return a needless error.
}
Hope this helps!
You're probably aware of wc, but I'll mention it just in case.
I know it doesn't directly help you debug your specific problem, but maybe you could glance at the source code and/or use it to verify that your program is working.
You have logical error in for() loop. You should use "bytes read" instead "read up to", what I mean in your code use "c" instead "size" in for()
精彩评论