How can I tell groovy/grails not to try to "re-encode" binary data? (Revised title)
I have a groovy/grails application that needs to serve images
It works fine on my dev box, the image is returned properly. Here's the start of the returned JPEG, as seen by od -cx
0000000 377 330 377 340 \0 020 J F I F \0 001 001 001 001 ,
d8ff e0ff 1000 464a 4649 0100 0101 2c01
but on the production box, there's some garbage in front, and the d8ff e0ff before the 1000 is missing
0000000 � ** ** � ** ** � ** ** � ** ** \0 020 J F
bfef efbd bdbf bfef efbd bdbf 1000 464a
0000020 I F \0 001 001 001 \0 H \0 H \0 \0 � ** ** �
4649 0100 0101 4800 开发者_StackOverflow 4800 0000 bfef efbd
It's the exact same code. I just moved the .war over and run it on a different machine. (Isn't Java supposed to be write once, run everywhere?)
Any ideas? An "encoding" problem?
The code is sent to the response like this:
response.contentType = "image/jpeg"; response.outputStream << out;
Here's the code that locates the image on an internal application server and re-serves the image. I've pared down the code a bit to remove the error handling, etc, to make it easier to read.
def show = {
def address = "http://internal.application.server:9899/img?photoid=${params.id}"
def out = new ByteArrayOutputStream()
out << new URL(address).openStream()
response.contentLength = out.size();
// XXX If you don't do this hack, "head" requests won't work!
if (request.method == 'HEAD')
{ render( text : "", contentType : "image/jpeg" ); }
else {
response.contentType = "image/jpeg"; response.outputStream << out;
}
}
Update: I tried setting the CharacterEncoding
response.setCharacterEncoding("ISO-8859-1");
if (request.method == 'HEAD')
{ render( text : "", contentType : "image/jpeg" ); }
else {
response.contentType = "image/jpeg;charset=ISO-8859-1"; response.outputStream << out;
}
but it made no difference in the output. On my production machine, the binary bytes in the image are re-encoded/escaped as if they were UTF-8 (see Michael's explanation below). It works fine on my development machine.
An "encoding" problem?
Absolutely. The sequence "bfef efbd bdbf bfef efbd bdbf" is actually 4 repeats of (little-endian) UTF-8 for the U+FFFD REPLACEMENT CHARACTER code point. So at some point, your binary data is being interpreted as UTF-8 character data, and of course it's not valid UTF-8.
Almost certainly your production box uses UTF-8 as platform default encoding while the dev box uses a bijective ISO-8859 encoding.
But the problem here is not the use of the platform default encoding. The problem is that your binary data is converted to character data and back. And that's almost certainly the fault of your code. How do you read the images / create and fill the out
variable?
EDIT:
Looking at the code, there doesn't seem to be anything obviously wrong. But I'm a bit suspicious of those shift operators and Groovy's type handling and implicity conversions in regard to the overloaded leftShift()
method of OutputStream
. To pinpoint the problem, try looking at the contents of the ByteArrayOutputStream
as well as reading the first bytes directly from the app server, to see where exactly things go wrong.
Or maybe the problem is further down the line - IIRC, groovy uses sitemesh to provide modular layouts. Perhaps that's the culprit, trying to parse the controller's output as HTML. Not sure how to switch it off, though.
I fixed it!
Many thanks to Michael Borgwardt who got me pointed in the right direction.
I changed this:
if (request.method == 'HEAD')
{ render( text : "", contentType : "image/jpeg" ); }
else {
response.contentType = "image/jpeg"; response.outputStream << out;
}
to this:
if (request.method == 'HEAD')
{ render( text : "", contentType : "image/jpeg" ); }
else {
response.contentType = "image/jpeg"; response.outputStream << out.toByteArray()
}
note the "toByteArray()") That prevented groovy/grails/java/spring/hibernate/tomcat or whatever gets in the way from deciding to re-encode my binary data.
精彩评论