开发者

Convert Unicode char to closest (most similar) char in ASCII (.NET)

How do I to convert different Unicode characters to their 开发者_开发百科closest ASCII equivalents? Like Ä -> A. I googled but didn't find any suitable solution. The trick Encoding.ASCII.GetBytes("Ä")[0] didn't work. (Result was ?).

I found that there is a class Encoder that has a Fallback property that is exactly for cases when char can't be converted, but implementations (EncoderReplacementFallback) are stupid and convert to ?.

Any ideas?


If it is just removing of the diacritical marks, then head to this answer:

static string RemoveDiacritics(string stIn) {
  string stFormD = stIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  for(int ich = 0; ich < stFormD.Length; ich++) {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
    if(uc != UnicodeCategory.NonSpacingMark) {
      sb.Append(stFormD[ich]);
    }
  }

  return(sb.ToString().Normalize(NormalizationForm.FormC));
}


MS Dynamics has a problem where it won't allow for any character outside of x20 to x7f and some characters within that range are also invalid. My answer was to create an array keyed to the invalid characters returning the best guess of the valid characters.
It ain't pretty, but it works.

Function PlainAscii(InText)
Dim i, c, a
Const cUTF7 = "^[\x20-\x7e]+$"
Const IgnoreCase = False
    PlainAscii = ""
    If InText = "" Then Exit Function
    If RegExTest(InText, cUTF7, IgnoreCase) Then
        PlainAscii = InText
    Else
        For i = 1 To Len(InText)
            c = Mid(InText, i, 1)
            a = Asc(c)
            If a = 10 Or a = 13 Or a = 9 Then
                ' Do Nothing - Allow LF, CR & TAB
            ElseIf a < 32 Then
                c = " "
            ElseIf a > 126 Then
                c = CvtToAscii(a)
            End If
            PlainAscii = PlainAscii & c
        Next
    End If
End Function

Function CvtToAscii(inChar)
' Maps The Characters With The 8th Bit Set To 7 Bit Characters
Dim arrChars
    arrChars = Array(" ", " ", "$", " ", ",", "f", """", " ", "t", "t", "^", "%", "S", "<", "O", " ", "Z", " ", " ", "'", "'", """", """", ".", "-", "-", "~", "T", "S", ">", "o", " ", "Z", "Y", " ", "!", "$", "$", "o", "$", "|", "S", " ", "c", " ", " ", " ", "_", "R", "_", ".", " ", " ", " ", " ", "u", "P", ".", ",", "i", " ", " ", " ", " ", " ", " ", "A", "A", "A", "A", "A", "A", "A", "C", "E", "E", "E", "E", "I", "I", "I", "I", "D", "N", "O", "O", "O", "O", "O", "X", "O", "U", "U", "U", "U", "Y", "b", "B", "a", "a", "a", "a", "a", "a", "a", "c", "e", "e", "e", "e", "i", "i", "i", "i", "o", "n", "o", "o", "o", "o", "o", "/", "O", "u", "u", "u", "u", "y", "p", "y")
    CvtToAscii = arrChars(inChar - 127)
End Function

Function RegExTest(ByVal strStringToSearch, strExpression, IgnoreCase)
Dim objRegEx
    On Error Resume Next
    Err.Clear
    strStringToSearch = Replace(Replace(strStringToSearch, vbCr, ""), vbLf, "")
    RegExTest = False
    Set objRegEx = New RegExp
    With objRegEx
        .Pattern = strExpression    '//the reg expression that should be searched for
        If Err.Number = 0 Then
            .IgnoreCase = CBool(IgnoreCase)    '//not case sensitive
            .Global = True              '//match all instances of pattern
            RegExTest = .Test(strStringToSearch)
        End If
    End With
    Set objRegEx = Nothing
    On Error Goto 0
End Function

Your answer is necessarily going to be different.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜