Reading upto newline
Hi My program reads a CSV file. So I used fgets to read one line at a time. But now the interface specification says开发者_JAVA技巧 that it is possible to find NULL characters in few of the columns. So I need to replace fgets with another function to read from the file Any suggestions?
If your text stream has a NUL
(ascii 0) character, you will need to handle your file as a binary file and use fread
to read the file. There are two approaches to this.
Read the entire file into memory. The length of the file can be obtained by
fseek(fp, 0, SEEK_END)
and then callingftell
.You can then allocate enough memory for the whole file.Once in memory, parsing the file should be relatively easy. This approach is only really suitable for smallish files (probably less than 50M max). For bonus marks look at themmap
function.Read the file byte by byte and add the characters to a buffer until a newline is found.
Read and parse bit by bit. Create a buffer that is biggest than you largest line and fill it with content from your file. You then parse and extract as many lines as you can. Add the remainder to the beginning of a new buffer an read the next bit. Using a bigger buffer will help minimize copying.
fgets
works perfectly well with embedded null bytes. Pre-fill your buffer with \n
(using memset
) and then use memchr(buf, '\n', sizeof buf)
. If memchr
returns NULL
, your buffer was too small and you need to enlarge it to read the rest of the line. Otherwise, you can determine whether the newline you found is the end of the line or the padding you pre-filled the buffer with by inspecting the next byte. If the newline you found is at the end of the buffer or has another newline just after it, it's from padding, and the previous byte is the null terminator inserted by fgets
(not a null from the file). Otherwise, the newline you found has a null byte after it (terminator inserted by fgets
, and it's the end-of-line newline.
Other approaches will be slow (repeated fgetc
) or waste (and risk running out of) resources (loading the whole file into memory).
use fread and then scan the block for the separator
Check the function int T_fread(FILE *input)
at http://www.mrx.net/c/source.html
精彩评论