开发者

JAXB2 Mtom attachment broken by BOM

I'm using JAXB2 to to do OXM in a Spring-WS. The XSD I've specified requires a large XML file to be attached to the soap message so I'm using MTOM to transfer the file and have enabled MTOM on my JAXB2Marshaller.

When JAXB2 marshalls an MTOM attachment which has an expected mime type of text/xml it delivers that element as a javax.xml.transform.Source object. After some searching I was able to find out how I can send that Source object to a file.

final Source source = request.getSource();
StreamSource streamSource = (StreamSource) source;
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
File file = new File ("/tempxmlfile.xml");
try{
    transformer.transform(streamSource, new StreamResult(file));
    LOG.info("File saved in "+file.getAbso开发者_高级运维lutePath());
    }
catch(Exception ex){
        ex.getMessage();
    }

The problem I am having is that when I send a UTF-8 encoded file as the attachment I get the following error:

[Fatal Error] :1:1: Content is not allowed in prolog.
ERROR:  'Content is not allowed in prolog.'

This is being caused by a Byte Order Mark in front of the encoded text in the file, although this BOM is not required in a UTF-8 encoded file it is allowed by the Unicode standard, Java does not support BOMs in UTF-8 encoded streams.

I can solve this problem by sending a file without the BOM but this is not really feasible as it will cause problems with most Microsoft products which do insert the BOM.

There are lots of workarounds for Sun/Oracle's refusal to fix this issue with the Streams but they all require you to have access to the Stream, the Source Object provided by JAXB2 does not have an InputStream it only has a Reader object. Is there a way for me to solve this problem, either by wrapping the Sources Reader object with a reader which knows how to ignore a BOM in UTF-8 encoding or to change the way JAXB2 reads the attachment into the source so that it can ignore the BOM in UTF-8 encoding.

Thanks in advance, Craig


The trick is to "mark" the Reader. If your reader does not support marking you can wrap it in a BufferedReader which does:

  • http://download.oracle.com/javase/6/docs/api/java/io/BufferedReader.html#markSupported%28%29

OPTION #1 - Check for BOM and Remove It

I believe my original code was writing the BOM incorrectly. The source code below makes more sense:

import java.io.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    private static char[] UTF32BE = {0x00, 0x00, 0xFE, 0xFF}; 
    private static char[] UTF32LE = {0xFF, 0xFE, 0x00, 0x00};
    private static char[] UTF16BE = {0xFE, 0xFF}; 
    private static char[] UTF16LE = {0xFF, 0xFE};
    private static char[] UTF8 = {0xEF, 0xBB, 0xBF};

    public static void main(String[] args) throws Exception {
        // Create an XML document with a BOM
        FileOutputStream fos = new FileOutputStream("bom.xml");
        writeBOM(fos, UTF16LE);

        OutputStreamWriter oswUTF8 = new OutputStreamWriter(fos, "UTF-8");
        oswUTF8.write("<root/>");
        oswUTF8.close();

        // Create a Source based on a Reader to simulate source.getRequest()
        StreamSource attachment = new StreamSource(new FileReader(new File("bom.xml")));

        // Wrap reader in BufferedReader so it will support marking
        Reader reader = new BufferedReader(attachment.getReader());

        // Remove the BOM
        removeBOM(reader);

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        t.transform(new StreamSource(reader), new StreamResult(System.out));
    }

    private static void writeBOM(OutputStream os, char[] bom) throws Exception {
        for(int x=0; x<bom.length; x++) {
            os.write((byte) bom[x]);
        }
    }

    private static void removeBOM(Reader reader) throws Exception {
        if(removeBOM(reader, UTF32BE)) {
            return;
        }
        if(removeBOM(reader, UTF32LE)) {
            return;
        }
        if(removeBOM(reader, UTF16BE)) {
            return;
        }
        if(removeBOM(reader, UTF16LE)) {
            return;
        }
        if(removeBOM(reader, UTF8)) {
            return;
        }
    }

    private static boolean removeBOM(Reader reader, char[] bom) throws Exception {
        int bomLength = bom.length;
        reader.mark(bomLength);
        char[] possibleBOM = new char[bomLength];
        reader.read(possibleBOM);
        for(int x=0; x<bomLength; x++) {
            if(bom[x] != possibleBOM[x]) {
                reader.reset();
                return false;
            }
        }
        return true;
    }

}

OPTION #2 - Find '<' and Advance Reader to that Point

Read until you hit the '<' leveraging mark/reset:

import java.io.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo2 {

    private static char[] UTF32BE = {0x00, 0x00, 0xFE, 0xFF}; 
    private static char[] UTF32LE = {0xFF, 0xFE, 0x00, 0x00};
    private static char[] UTF16BE = {0xFE, 0xFF}; 
    private static char[] UTF16LE = {0xFF, 0xFE};
    private static char[] UTF8 = {0xEF, 0xBB, 0xBF};

    public static void main(String[] args) throws Exception {
        // Create an XML document with a BOM
        FileOutputStream fos = new FileOutputStream("bom.xml");
        writeBOM(fos, UTF16BE);

        OutputStreamWriter oswUTF8 = new OutputStreamWriter(fos, "UTF-8");
        oswUTF8.write("<root/>");
        oswUTF8.close();

        // Create a Source based on a Reader to simulate source.getRequest()
        StreamSource attachment = new StreamSource(new FileReader(new File("bom.xml")));

        // Wrap reader in BufferedReader so it will support marking
        Reader reader = new BufferedReader(attachment.getReader());

        // Remove the BOM
        removeBOM(reader);

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        t.transform(new StreamSource(reader), new StreamResult(System.out));
    }

    private static void writeBOM(OutputStream os, char[] bom) throws Exception {
        for(int x=0; x<bom.length; x++) {
            os.write((byte) bom[x]);
        }
    }

    private static Reader removeBOM(Reader reader) throws Exception {
        reader.mark(1);
        char[] potentialStart = new char[1];
        reader.read(potentialStart);
        if('<' == potentialStart[0]) {
            reader.reset();
            return reader;
        } else {
            return removeBOM(reader);
        }
    }

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜