开发者

C# Regular Expression Input String Problem

I have a problem with the following program, it compiles but when i run it it says input string is not in the correct format. Could anyone assist.

        string path = @"C:/Documents and Settings/expn261/Desktop/CharacterTest/Output.xls";
        string strCharater = File.ReadAllText(path,UTF7Encoding.UTF7);

        strCharater = Regex.Replace(strCharater, "[èéèëêð]", "e");
        strCharater = Regex.Replace(strCharater, "[ÉÈËÊ]", "E");
        strCharater = Regex.Replace(strCharater, "[àâä]", "a");
        strCharater = Regex.Replace(strCharater, "[ÀÁÂÃÄÅ]", "A");
        strCharater = Regex.Replace(strCharater, "[àáâãäå]", "a");
        strCharater = Regex.Replace(strCharater, "[ÙÚÛÜ]", "U");
        strCharater = Regex.Replace(strCharater, "[ùúûüµ]", "u");
        strCharater = Regex.Replace(strCharater, "[òóôõöø]", "o");
        strCharater = Regex.Replace(strCharater, "[ÒÓÔÕÖØ]", "O");
        strCharater = Regex.Replace(strCharater, "[ìíîï]", "i");
        strCharater = Regex.Replace(strCharater, "[ÌÍÎÏ]", "I");
        strCharater = Regex.Replace(strCharater, "[š]", "s");
        strCharater = Regex.Replace(strCharater, "[Š]", "S");
        strCharater = Regex.Replace(strCharater, "[ñ]", "n");
        strCharater = Regex.Replace(strCharater, "[Ñ]", "N");
        strCharater = Regex.Replace(strCharater, "[ç]", "c");
        strCharater = Regex.Replace(strCharater, "[Ç]", "C");
        strCharater = Regex.Replace(strCharater, "[ÿ]", "y");
        strCharater = Regex.Replace(strCharater, "[Ÿ]", "Y");
        strCharater = Regex.Replace(strCharater, "[ž]", "z");
        strCharater = R开发者_Go百科egex.Replace(strCharater, "[Ž]", "Z");
        strCharater = Regex.Replace(strCharater, "[Ð]", "D");
        strCharater = Regex.Replace(strCharater, "[œ]", "oe");
        strCharater = Regex.Replace(strCharater, "[Œ]", "Oe");
        strCharater = Regex.Replace(strCharater, "[«»\u201C\u201D\u201E\u201F\u2033\u2036]", "\"");
        strCharater = Regex.Replace(strCharater, "[\u2026]", "...");

        string path2 = (@"C:/Documents and Settings/expn261/My Documents/CharacterReplaceTest.csv");
        StreamWriter sw = new StreamWriter(path2);
        sw.WriteLine(strCharater, UTF7Encoding.UTF7);


This is not very well known, but work like a charm. Removes all diacritics.

// using System.Globalization
public static string RemoveDiacritics(string s) {
    s = s.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    for (int i = 0; i < s.Length; i++) {
        if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
    }

    return sb.ToString();
}


It looks like what you are trying to do is translate characters in a string. This is one of those cases where you might actually just want to write up a big switch statement:

var sb = new StringBuilder();
foreach (char c in strCharater) // could you choose a better name than strCharater?
{
    switch (c)
    {
       case 'è':
       case 'é':
          sb.Append('e');
          break;
       case 'ä':
       case 'à':
          break;
       default:
          sb.Add(c);
          break;
    }
}
strCharater = sb.ToString();

This approach has the benefit of not creating tons of (immutable) strings that have to be allocated and garbage collected. Also, the JIT should compile this down to very fast code!


When an exception occurs, compiler creates a bundle called stack trace which is the address of all places where the exception occurred, going back to the first method call chain which caused that exception. See in which line this problem exist and try to concentrate on that line only, instead of reviewing the entire block. :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜