Apache Commons ZipArchiveOutputStream breaks upon adding Filenames with non ASCII Chars
I am using an org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream to add files coming from a Subversion repository. This works fine as long as I do not use German Umlauts (ä,ö,ü) or any other special characters in the filename. I am wondering what would be the fastest way to make it accept non ASCII chars?
def zip(repo: SVNRepository, out: OutputStream, url: String, resourceList: Seq
[SVNResource]) {
val zout = new ZipArchiveOutputStream(new BufferedOutputStream(out))
zout.setEncoding("Cp437");
zout.setFallbackToUTF8(true);
zout.setUseLanguageEncodingFlag(true);
zout.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);
try {
for (resource <- resourceList) {
addFileToStream(repo, zout, resource)
}
}
finally {
zout.finish
zout.close
}
}
private def addFileToStream(repo: SVNRepository, zout: ZipArchiveOutputStream, resource:SVNResource): ZipArchiveOutputStream = {
val entry = resource.entry
val url = YSTRepo.getAbsolutePath(entry)
if (FILE == e开发者_如何学JAVAntry.getKind.toString) {
val file = new File(url)
val zipEntry = new ZipArchiveEntry(file, url)
zout.putArchiveEntry(zipEntry)
val baos = new ByteArrayOutputStream()
val fileprops = new SVNProperties()
repo.getFile(url, -1, fileprops, baos)
IOUtils.copy(new ByteArrayInputStream(baos.toByteArray), zout)
zout.closeArchiveEntry
} else if (DIR == entry.getKind.toString) {
if (resource.hasChildren) {
val dirProps = new SVNProperties()
val entries = repo.getDir(url, -1, dirProps, new java.util.ArrayList[SVNDirEntry])
for (child <- SVNResource.listDir(repo, entries.toList.asInstanceOf[Seq SVNDirEntry]])) {
addFileToStream(repo, zout, child)
}
}
}
zout
}
I solved the issue by setting
UnicodeExtraFieldPolicy.NOT_ENCODEABLE
to
UnicodeExtraFieldPolicy.ALWAYS
Filenames are now displayed correctly using Linux-Unzip, Windows-Compressed-Folders, IZArc and WINZIP.
Based on your comments, it sounds like the real problem is with the Linux unzip
program and/or the encoding supported by your Linux filesystem. One solution is to pass the -U
option to unzip, which will escape any Unicode characters in filenames.
That said, I also recommend removing the following lines when you write your ZIPfile:
zout.setEncoding("Cp437");
zout.setFallbackToUTF8(true);
zout.setUseLanguageEncodingFlag(true);
And replace them with the following:
zout.setEncoding("UTF-8");
This should result in the highest portability.
You can try passing the filename through URLEncoder first: http://download.oracle.com/javase/6/docs/api/java/net/URLEncoder.html
That will ensure that the zipped filename is pure ASCII
When reading it back out, use URLDecoder to recover the full UFT-8 character set: http://download.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
精彩评论