开发者

Processing a BZIP string/file in Scala

I'm punishing myself a bit by doing the python challenges series in Scala.

Now, one of the challenges is to read in a string that's been compressed using the bzip algorithm and output the result.

BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084

Now, after some digging it appears as if there isn't a standard java library for bzip processing, but there is something in the apache ant project, that this guy has kindly taken out for use as a separate library.

The thing is, I can't seem to get it to work with the following code, it just hangs in the scala REPL and the JVM maxes out at 100% CPU usage

This is the code I'm trying...

import java.io.{ByteArrayInputStream}
import org.apac开发者_高级运维he.tools.bzip2.{CBZip2InputStream}
import org.apache.commons.io.{IOUtils}
object ChallengeEight extends Application {
    val inputString = """BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"""
    val inputStream = new ByteArrayInputStream( inputString.getBytes("UTF-8") ) //convert string to inputstream
    inputStream.skip(2) //skip the 'BZ' part at the start
    val bzipInputStream = new CBZip2InputStream(inputStream)  //hangs here....
    val result = IOUtils.toString(bzipInputStream, "UTF-8");
    println(result)
}

Anyone got any ideas? Or is the CBZip2InputStream class expecting some extra bytes that you might find in a file that has been zipped with bzip2?

Any help would be appreciated

EDIT For the record this is the python solution

import bz2

un = "BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!" \
     "\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"

print [bz2.decompress(elt) for elt in (un)]


To escape characters use a unicode escape sequence like \uXXXX syntax where XXXX is the hexadecimal sequence for the unicode character.

val un = "BZh91AY&SYA\u00af\u0082\r\u0000\u0000\u0001\u0001\u0080\u0002\u00c0\u0002\u0000 \u0000!\u009ah3M\u0007<]\u00c9\u0014\u00e1BA\u0006\u00be\u00084"


You are enclosing your string in triple quotes which means you will pass the literal characters to the algorithm rather than the control/binary characters they represent.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜