开发者

How can I "flatten" text that contains macrons and umlauts in .NET? [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

How to convert a Unicode character to its ASCII equivalent

How do I remove diacritics (accents) from a string in .NET?

I need to make a search form insensitive to text that contains macrons, umlauts, etc.

For example, "ŌōṒṓṐṑȪȫ" should be considered equal to "oooooooo".

In TSQL I'm able to get it partially working with:

select Cast('ŌōṒṓṐṑȪȫ' as varchar)

which returns Oo??????. It is smart enough to translate the first two characters to "O" and "o".

I was trying to use this C# code to "flatten" the text but it doesn't work at all. The result is "?????开发者_C百科???".

var text = "ŌōṒṓṐṑȪȫ";
var buffer = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, Encoding.Unicode.GetBytes(text));

var result = Encoding.ASCII.GetString(buffer);

Is there a way do this in .NET? I know I could create a map that links characters such as "ŌōṒṓṐṑȪȫ" to "o" and so on for other characters, but I'm hoping there is already a built-in way to do this.


EDIT:
Ignore the original. The String class has a set of overloaded Normalize() methods.

ORIGINAL:

I don't know of any method built in to .NET, however these two articles and a little Win32 pinvoke and you should be set:

See section 4.3: Normalization

Win32 Unicode overview


You don't need to do normalization, it is time consuming, and there is something better.

Most string comparison operations have a flavor that takes a CompareOptions. You can use this for CompareOptions:

static_cast<CompareOptions>(CompareOptions::IgnoreCase | CompareOptions::IgnoreNonSpace)

See the CompareInfo class http://msdn.microsoft.com/en-us/library/2z428sw8.aspx

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜