Passing a string from C# to cpp with COM
I have a C# COM server which is consumed by a cpp client.
One of the C# methods returns a string.
In cpp the returned string is represented in Unicode (UTF-16), at least according to the memory view.
- Is this always the case with COM strings?
- Is there a way to use UTF-8 inste开发者_如何学Goad?
- I saw some code where strings were passed between cpp and c# as byte arrays. Is there any benefit in this?
- Yes. The standard COM string type is BSTR. It is a Unicode string encoded in UTF16, just like Windows' native string type.
- No, a COM method isn't going to understand a UTF8 string, it will turn it into Chinese. UTF8 is a good encoding for a text file, not for programs manipulating strings in memory. UTF8 requires anywhere between 1 and 4 bytes to encode a Unicode codepoint. Very incompatible with basic string manipulations like getting the size or indexing a character.
- C and C++ programs tend to use 8-bit encodings, compatible with the "char" type. That's an old practice, dating back from an era before Unicode was around. There's nothing attractive about it, there are many 8-bit encodings. The typical problem is that data entered as text can only be interpreted correctly if it is read by a program that uses the same 8-bit encoding. In other words, when the computers are less than 1000 miles apart. Less in Europe.
- No.
- Yes. Put the attribute
[return: MarshalAs(UnmanagedType.LPStr)]
before the method definition in C# if you'd like to return the string as an ANSI string instead of Unicode. - Yeah--the author may have done that to maintain very fine-grained control on the encoding of the contents of the string by side-stepping the default marshalling behavior.
精彩评论