Any idea how to enforce utf8 within a document
I am creating an xml document and atttempting to store at as utf8. However, i am receiving a non utf8 apostrophe within the stored document.
eg : <Name=Dave t="Owner(e.g pete’s)">
I have tried the follwoing
`System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
var docX = encoding.GetBytes(vdd.ToString());
System.IO.StreamWriter s = new StreamWriter(pathAndFileName, false, encoding);
string myString = encoding.GetString(docX);
s.Write(myString);
Which should have been overkill, but the '’' inside of the brackets is still showing. I have also tried htmlencode, which didn't help.
The xml reads fine as utf8 in notepad++, but the ’ character is not parsing on all of my clients systems.
Help please.....
EDIT: Dour noted something I missed in all the confusion; the sample you pasted is not XML at all, and therefore will not parse. My answer still applies insofar as 'html encoding' and UTF8 encoding were the wrong roads to be going down here.
It's difficult to tell exactly what your problem is, but I've tried to eliminate some of the possibilities and come up with a possibility: the ’
is causing your XML not to be parsed correctly.
This is not an encoding problem. As The Skeet notes, UTF8 can represent all Unicode characters, including that one. Instead, this is an... umm... an encoding problem. That is: a XML data encoding problem.
The character should be attribute encoded, not html encoded
What API are you using to build the XML? That should be done for you, so you don't need to worry about what to encode, how, and why. But if you attribute encode the ’
character, I think your problem will cease.
Assuming I understand your problem...
<Name=Dave t="Owner(e.g pete’s)">
This is not XML, the '=' is illegal for a tag name. If it's supposed to be an attribute it must be quoted. It's also unterminated and has no XML declaration; if this is what you're trying to output, you're not outputting XML. The ’ character is allowed both in UTF-8 and XML attribute values.
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
var docX = encoding.GetBytes(vdd.ToString());
docX
is a byte array of the UTF-8 bytes in vdd
. If vdd
contains any non-Unicode points they will be discarded.
System.IO.StreamWriter s = new StreamWriter(pathAndFileName, false, encoding);
You're opening a UTF-8-encoded output stream, fair enough...
string myString = encoding.GetString(docX);
Now you're converting your UTF-8-encoded array back into a C# string. Why?
s.Write(myString);
Now you're writing the C# string back to a UTF-8 stream, which does a second UTF-8 conversion. This makes no sense, please explain what you're trying to accomplish.
the ’ character is not parsing on all of my clients systems
Then your clients system is not accepting UTF-8. Either fix it, or find out what encoding they are accepting and use that.
精彩评论