libarchive reads too many chars when extracting a file
I've written a C program to extract files from a tar archive using libarchive.
I'd like to extract a file from this archive and print it to standard output.  But I get extra characters.  It's garbage, but it's from another file (possibly adjacent to it in the archive.)  I expect output to end at </html>.
Here is the code that reads this tar file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "archive.h"
#include "archive_entry.h"
int main (int argc, const char * argv[]) 
{
    struct archive *a;
    struct archive_entry *entry;
    int r;
    int64_t entry_size;
    a = archive_read_new();
    archive_read_support_compression_none(a);
    archive_read_support_format_tar(a);
    r = archive_read_open_filename(a, "0000.tar", 1024);
    if (r != ARCHIVE_OK)
    {
        printf("archive not found");
    }
    else 
    {
        while (archive_read_next_header(a, &entry) == ARCHIVE_OK) 
        {
            const char *currentFile = archive_entry_pathname(entry);
            char *fileContents;
            entry_size = archive_entry_size(entry); //get the size of the file
            fileContents = malloc(entry_size); //alloc enough for string - from my testing I see that this is how many bytes tar and ls report from command line
            archive_read_data(a, fileContents, entry_size); //read data into fileContents string for the HTML file size
            if(strcmp(currentFile, "vendar-definition.html") == 0)
            {
                printf("file name = %开发者_运维知识库s, size = %lld\n", currentFile, entry_size);
                printf("%s\n\n", fileContents); //this output over-reads chars from another file in this tar file
            }           
            free(fileContents); //free the C string because I malloc'd
        }
    }
    printf("exit");
    return 0;
}
libarchive 2.8.3 compiled on mac os X 10.6.3. gcc 4.2 x86_64
ls -l vendar-definition.html gives me 1921 for the file size.  And so shows tar tfv 0000.tar | grep vendar-definition.html.  So reports the C output that states the file size.  To me this seems correct.
Two possibilities I can see for why my output is not as expected:
- I've made a beginner's mistake or
- multibyte characters in the archive files has something to do with it.
I could be very wrong but that doesn't look like a null-terminated string (I don't think archive_read_data takes care of that). Append a NULL character or see this and tell us how it goes.
I suspect you're not reading too many chars, but only printing too many.
You're outputting the file contents using the %s specifier to printf, which expects the input to be a null-terminated string. The contents of a file in the archive may not be null-terminated, and may contain arbitrary nulls in the middle.
Try outputting like this instead:
fwrite(fileContents, sizeof(char), entry_size, stdout);
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论