Processing a BZIP string/file in Scala
I'm punishing myself a bit by doing the python challenges series in Scala.
Now, one of the challenges is to read in a string that's been compressed using the bzip algorithm and output the result.
BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084
Now, after some digging it appears as if there isn't a standard java library for bzip processing, but there is something in the apache ant project, that this guy has kindly taken out for use as a separate library.
The thing is, I can't seem to get it to work with the following code, it just hangs in the scala REPL and the JVM maxes out at 100% CPU usage
This is the code I'm trying...
import java.io.{ByteArrayInputStream}
import org.apac开发者_高级运维he.tools.bzip2.{CBZip2InputStream}
import org.apache.commons.io.{IOUtils}
object ChallengeEight extends Application {
val inputString = """BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"""
val inputStream = new ByteArrayInputStream( inputString.getBytes("UTF-8") ) //convert string to inputstream
inputStream.skip(2) //skip the 'BZ' part at the start
val bzipInputStream = new CBZip2InputStream(inputStream) //hangs here....
val result = IOUtils.toString(bzipInputStream, "UTF-8");
println(result)
}
Anyone got any ideas? Or is the CBZip2InputStream
class expecting some extra bytes that you might find in a file that has been zipped with bzip2
?
Any help would be appreciated
EDIT For the record this is the python solution
import bz2
un = "BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!" \
"\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"
print [bz2.decompress(elt) for elt in (un)]
To escape characters use a unicode escape sequence like \uXXXX
syntax where XXXX is the hexadecimal sequence for the unicode character.
val un = "BZh91AY&SYA\u00af\u0082\r\u0000\u0000\u0001\u0001\u0080\u0002\u00c0\u0002\u0000 \u0000!\u009ah3M\u0007<]\u00c9\u0014\u00e1BA\u0006\u00be\u00084"
You are enclosing your string in triple quotes which means you will pass the literal characters to the algorithm rather than the control/binary characters they represent.
精彩评论