开发者

Apache Commons ZipArchiveOutputStream breaks upon adding Filenames with non ASCII Chars

I am using an org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream to add files coming from a Subversion repository. This works fine as long as I do not use German Umlauts (ä,ö,ü) or any other special characters in the filename. I am wondering what would be the fastest way to make it accept non ASCII chars?

def zip(repo: SVNRepository, out: OutputStream, url: String, resourceList: Seq  
       [SVNResource]) {
  val zout = new ZipArchiveOutputStream(new BufferedOutputStream(out))
  zout.setEncoding("Cp437");
  zout.setFallbackToUTF8(true);
  zout.setUseLanguageEncodingFlag(true);
  zout.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);
  try {
    for (resource <- resourceList) {
      addFileToStream(repo, zout, resource)
    }
  }
  finally {
    zout.finish
    zout.close
  }
}

private def addFileToStream(repo: SVNRepository, zout: ZipArchiveOutputStream, resource:SVNResource): ZipArchiveOutputStream = {
  val entry = resource.entry
  val url = YSTRepo.getAbsolutePath(entry)
  if (FILE == e开发者_如何学JAVAntry.getKind.toString) {
    val file = new File(url)
    val zipEntry = new ZipArchiveEntry(file, url)   
    zout.putArchiveEntry(zipEntry)
    val baos = new ByteArrayOutputStream()
    val fileprops = new SVNProperties()
    repo.getFile(url, -1, fileprops, baos)
    IOUtils.copy(new ByteArrayInputStream(baos.toByteArray), zout)
    zout.closeArchiveEntry
  } else if (DIR == entry.getKind.toString) {
    if (resource.hasChildren) {
      val dirProps = new SVNProperties()
      val entries = repo.getDir(url, -1, dirProps, new java.util.ArrayList[SVNDirEntry])
      for (child <- SVNResource.listDir(repo, entries.toList.asInstanceOf[Seq SVNDirEntry]])) {
        addFileToStream(repo, zout, child)
      }
    }
  }
  zout
}


I solved the issue by setting

UnicodeExtraFieldPolicy.NOT_ENCODEABLE 

to

UnicodeExtraFieldPolicy.ALWAYS

Filenames are now displayed correctly using Linux-Unzip, Windows-Compressed-Folders, IZArc and WINZIP.


Based on your comments, it sounds like the real problem is with the Linux unzip program and/or the encoding supported by your Linux filesystem. One solution is to pass the -U option to unzip, which will escape any Unicode characters in filenames.

That said, I also recommend removing the following lines when you write your ZIPfile:

zout.setEncoding("Cp437");
zout.setFallbackToUTF8(true);
zout.setUseLanguageEncodingFlag(true);

And replace them with the following:

zout.setEncoding("UTF-8");

This should result in the highest portability.


You can try passing the filename through URLEncoder first: http://download.oracle.com/javase/6/docs/api/java/net/URLEncoder.html

That will ensure that the zipped filename is pure ASCII

When reading it back out, use URLDecoder to recover the full UFT-8 character set: http://download.oracle.com/javase/6/docs/api/java/net/URLDecoder.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜