Archiving thousands of files and 7zip limitations
My application requires that a task is run everyday in which 100,000+ PDF (~ 50kb each) files need to be zipped. Currently, I'm using 7-zip and calling 7za.exe
(the command line tool with 7-zip) to zip each file (files are located in many different folders).
What are the limitations in this approach and how can they be solved? 开发者_开发知识库Is there a file size or file count limit for a 7zip
archive?
The limit on file size is 16 exabytes, or 16000000000 GB.
There is no hard limit on the number of files, but there is a practical limit in how it manages the headers for the files. The exact limit depends on the path lengths but on a 32-bit system you'll run into limits somewhere around a million files.
I'm not sure if any other format supports more. Regular zip has far smaller limits.
http://en.wikipedia.org/wiki/7-Zip
One notable limitation of 7-Zip is that, while it supports file sizes of up to 16 exabytes, it has an unusually high overhead allocating memory for files, on top of the memory requirements for performing the actual compression.
Approximately 1 kilobyte is required per file (More if the pathname is very long) and the file listing alone can grow to an order of magnitude greater than the memory required to do the actual compression. In real world terms, this means 32-bit systems cannot compress more than a million or so files in one archive as the memory requirements exceed the 2 GB process limit.
64-bit systems do not suffer from the same process size limitation, but still require several gigabytes of RAM to overcome this limitation. Archives created on such systems would be unusable on machines with less memory however.
精彩评论