Split CSV files into exact 1gb files or little less? [closed]
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this questionEvery month we receive a invoice file that is always bigger then 2GB, our print house has a 1.1GB limitation and we currently do all these process by hand.
The first step in this application would be to be able to split those HUGE 2GB files into limited 1GB f开发者_C百科iles in a way it won't break each CSV entry and that each files will be readable from the start to the end without breaking any data.
How could I split the file to me the above requirements ?
Are there any libraries for this such of process on CSV files ?
How about just copying the first 1 GB of data from the source into a new file, then searching backward for the last newline, and truncating the new file after that. Then you know how large the first file is, and you repeat the process for a second new file from that point to 1 GB later. Seems straightforward to me in just about any language (you mentioned C#, which I haven't used recently, but certainly it can easily do the job).
You didn't make it clear whether you need to copy the header line (if any) to each of the resulting files. Again, should be straightforward--just do it prior to the copying of data into each of the files.
You could also take the approach of just generically splitting the files using tar
on Unix or some Zip-like utility on Windows, then telling your large-file-challenged partner to reconstruct the file from that format. Or maybe simply compressing the CSV file would work, and get you under the limit in practice.
There are just a few things you need to take care of:
- Keep the line breaks: split the file on a new line (algorithmically said split the file on the previous line to that where the 1GB limit occured minus the header line size)
- Copy the header to the beginning of the new file and then paste the rest
- Preserve the encoding.
In a bash/terminal prompt, write:
man split
.. then
man wc
.. simply count the number of lines in the file, divide it by X, feed the number to split and you have X files less than 1.1GB (if x = filesize/1.1)
精彩评论