java.util.zip - ZipInputStream v.s. ZipFile
I have some g开发者_开发百科eneral questions regarding the java.util.zip
library.
What we basically do is an import and an export of many small components. Previously these components were imported and exported using a single big file, e.g.:
<component-type-a id="1"/>
<component-type-a id="2"/>
<component-type-a id="N"/>
<component-type-b id="1"/>
<component-type-b id="2"/>
<component-type-b id="N"/>
Please note that the order of the components during import is relevant.
Now every component should occupy its own file which should be externally versioned, QA-ed, bla, bla. We decided that the output of our export should be a zip file (with all these files in) and the input of our import should be a similar zip file. We do not want to explode the zip in our system. We do not want opening separate streams for each of the small files. My current questions:
Q1. May the ZipInputStream
guarantee that the zip entries (the little files) will be read in the same order in which they were inserted by our export that uses ZipOutputStream
? I assume reading is something like:
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null)
{
//read from zis until available
}
I know that the central zip directory is put at the end of the zip file but nevertheless the file entries inside have sequential order. I also know that relying on the order is an ugly idea but I just want to have all the facts in mind.
Q2. If I use ZipFile
(which I prefer) what is the performance impact of calling getInputStream()
hundreds of times? Will it be much slower than the ZipInputStream
solution? The zip is opened only once and ZipFile
is backed by RandomAccessFile
- is this correct?
I assume reading is something like:
ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();//TODO: assure the order of the entries
while(e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
is = zipfile.getInputStream(entry));
}
Q3. Are the input streams retrieved from the same ZipFile
thread safe (e.g. may I read different entries in different threads simultaneously)? Any performance penalties?
Thanks for your answers!
Q1: yes, order will be the same in which entries were added.
Q2: note that due to structure of zip archive files, and compression, none of solutions is exactly streaming; they all do some level of buffering. And if you check out JDK sources, implementations share most code. There is no real random access to within content, although index does allow finding chunks that correspond to entries. So I think there should not be meaningful performance differences; especially as OS will do caching of disk blocks anyway. You may want to just test performance to verify this with a simple test case.
Q3: I would not count on this; and most likely they aren't. If you really think concurrent access would help (mostly because decompression is CPU bound, so it might help), I'd try reading the whole file in memory, expose via ByteArrayInputStream, and construct multiple independent readers.
I measured that just listing the files with ZipInputStream
is 8 times slower than with ZipFile
.
long t = System.nanoTime();
ZipFile zip = new ZipFile(jarFile);
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements())
{
ZipEntry entry = entries.nextElement();
String filename = entry.getName();
if (!filename.startsWith(JAR_TEXTURE_PATH))
continue;
textureFiles.add(filename);
}
zip.close();
System.out.println((System.nanoTime() - t) / 1e9);
and
long t = System.nanoTime();
ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile));
ZipEntry entry;
while ((entry = zip.getNextEntry()) != null)
{
String filename = entry.getName();
if (!filename.startsWith(JAR_TEXTURE_PATH))
continue;
textureFiles.add(filename);
}
zip.close();
System.out.println((System.nanoTime() - t) / 1e9);
(Don't run them in the same class. Make two different classes and run them separately)
Regarding Q3, experience in JENKINS-14362 suggests that zlib is not thread-safe even when operating on unrelated streams, i.e. that it has some improperly shared static state. Not proven, just a warning.
Using ZipFile.getInputStream() is significantly faster that using new ZipInputStream(). Just try it yourself.
精彩评论