How to handle character encoding with XML, JDom, JNI and C++
I am developing an application that reads in an XML document and passes the contents with JNI to a C++-DLL which validates it.
For this task I am using JDom and JUniversalChardet to parse the XML file in the correct encoding. My C++ accepts a const char*
for the contents of the XML file and needs it in the encoding "ISO-8895-15", otherwise it will throw an exception because of malformed characters.
My first approach was to use the shipped OutputFormatter of JDom and tell it to use Charset.forName("ISO-8859-15")
while formatting the JDom document to a String. After that the header part of the XML in this String says:
<?xml version="1.0" encoding="ISO-8859-15"?>
The Problem is that it is still stored in a Java String and therefore UTF-16 if I got that right.
My native method looks something like this:
public native String jniApiCall(String xmlFileContents);
So I pass the above mentioned String from the OutputFormatter of JDom into this JNI-Method. Still everything UTF-16, right?
In the JNI-C++-Method I access the xmlFileContents String
开发者_StackOverflow with
const string xmlDataString = env->GetStringUTFChars(xmlFileContents, NULL);
So, now I got my above mentioned String in UTF-16 or UTF-8? And my next question would be: how can I change the character encoding of the std::string xmlDataString
to ISO-8859-15? Or is the way I am doing this not exactly elegant? Or is there a way to do the character encoding completely in Java?
Thanks for your help! Marco
You can always convert any String
to byte array with needed character encoding using byte[] getBytes(Charset charset)
method (or even byte[] getBytes(String charsetName)
).
In java you can maybe use myString.getBytes("ISO-8859-15")
; to get the byte array of the String using the character encoding used as parameter (in this case ISO-8859-15
).
And then use that byte array in C
to get the std::string
with something like:
std::string myNewstring ( reinterpret_cast< char const* >(myByteArray) )
精彩评论