开发者

Java - Send a UTF-8 string via web service and XML that may contain illegal characters

I have a Web Service written in Java. I want to send some strings in the form of a XML file. But these strings may contain some characters that are recognized as illegal in XML. Currently I replace all of them with ?, create the XML and send it over the network (to the Silverlight app). But sometimes all I get are question marks! So I want to somehow encode/decode these strings before and after I send them to get the exact strings. These strings are in UTF-8 encoding. I'm using something like this to create the XML:

try{
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

    //root elements
    Document doc = docBuilder.newDocument();
    Element rootElement = doc.createElement("SearchResults");
    rootElement.setAttribute("count", Integer.toString(total));
    doc.appendChild(rootElement);

    for(int i = 0; i < results.size(); i++)
    {
        Result res = results.get(i);
        //title
        Element title = doc.createElement("Title");
        title.appendChild(doc.createTextNode(res.title));
        searchRes.appendChild(title);

        //...
    }
    //write the content into xml file
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFa开发者_C百科ctory.newTransformer();
    DOMSource source = new DOMSource(doc);
    StringWriter sw = new StringWriter();
    StreamResult result =  new StreamResult(sw);
    transformer.transform(source, result);
    String ret = sw.toString();
    return ret;
}
catch(ParserConfigurationException pce){
    pce.printStackTrace();
}catch(TransformerException tfe){
    tfe.printStackTrace();
}
return null;

Thank you.

PS: Some people said that they didn't understand my question so maybe I didn't say it right so I try to clarify it with an example. Suppose I have an array of items.

Each item has 3 strings.

These strings are UTF-8 strings (from many languages).

I want to send these strings to the client via a Web Service in Java.

The client part is Silverlight. In the Silverlight app,

I get the XML, parse it and use LinQ to extract data from it and I use that data in my Silverlight app.

When I try to escape the characters, somehow the parser in the Silverlight throws an exception saying that there's an illegal character in the source string (XML string) after debugging I found out that actually there IS an illegal character but I don't know how to produce a guaranteed legal XML string.

Edit: Thank you all for your support. I REALLY appreciate it.

I solved my problem.

Turns out somewhere in my code I was producing an illegal character and appending it to my result string.

The question still remains (How can I produce a legal XML file even though I'm providing it some illegal characters - note that I solved the problem by eliminating the illegal character before producing the XML so I still wonder what if I wanted to somehow send it over?) but since my problem is solved and there's tons of answers here, I guess the future readers have a head start to begin the journey to face this problem.

I didn't have the time but I'm sure these will help.

There's lots of answers and helps so I cannot select one of them to be my specific answer.

But I have to choose one of them.

I sincerely thank all of the responses.


If you're sending non-character data (i.e. binary data for example) in your XML, you might encode them using Base64. But I'm not sure I've understood your question correctly.

Maybe you just forgot to encode your XML in UTF-8, using transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8")


Not sure I understand your question, but maybe you should wrap the data under CDATA tag so that its not parsed by the XML parser. Here is the documentation from MSDN.


Wrap the content with <![CDATA[ and ]]>.

More info here: http://www.w3schools.com/xml/xml_cdata.asp


By experience I would recommend escaping / unescaping XML. Take at look at StringEscapeUtils from Apache Commons Lang.


You should try the StringEscapeUtils from apache

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜