C# - Detecting encoding in a file, write change to file using the found encoding

2023-01-29 04:40 问答作者：

I wrote a small program for iterating through a lot of files and applying some changes where a certain string match is found, the problem I have is that different files have different encodings. So what I would like to do is check the encoding, then overwrite the file in its original encoding.

What would be the prettiest way of doing that in C# .net 2.0?

My code looks very simple as of now;

String f1 = File.ReadAllText(fileList[i])开发者_如何学编程.ToLower();

if (f1.Contains(oPath))
{
    f1 = f1.Replace(oPath, nPath);
    File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}

I took a look at Auto encoding detect in C# which made me realize how I could detect encoding, but I am not sure how I could use that information to write in the same encoding.

Would greatly appreciate any help here.

Unfortunately encoding is one of those subjects where there is not always a definitive answer. In many cases it's much closer to guessing the encoding as opposed to detecting it. Raymond Chen did an excellent blog post on this subject that is worth the read

http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

The gist of the article is

If the BOM (byte order marker) exists then you're golden
Else it's guess work and heuristics

However I still think the best approach is to Darin mentioned in the question you linked. Let StreamReader guess for you vs. re-inventing the wheel. It only requires a very slight modification to your sample.

String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
  f1 = reader.ReadToEnd().ToLower();
  encoding = reader.CurrentEncoding;
}

if (f1.Contains(oPath))
{
  f1 = f1.Replace(oPath, nPath);
  File.WriteAllText(fileList[i], f1, encoding);
}

By default, .Net use UTF8. It is hard to detect character encoding becus most of the time .Net will read as UTF8. i alway have problem with ANSI.

my trick is i will read the file as Stream as force it to read as UTF8 and detect usual character that should be in text. If found, then UTF8 else ANSI ... and tell user u can use just 2 encoding either ANSI or UTF8. auto dectect not quite work in my language :p

I am afraid, you will have to know the encoding. For UTF based encodings though you can use StreamReader built in functionality though.

Taken form here.

With regard to encodings - you will need to have identified the encoding in order to use the StreamReader.

However, the StreamReader itself can help if you create it with one of the constructor overloads that allows you to supply the flag detectEncodingFromByteOrderMarks as true (or you can use Encoding.GetPreamble and look at the byte preamble yourself).

Both these methods will only help auto-detect UTF based encodings though - so any ANSI encodings with a specified codepage will probably not be parsed correctly.

Prob a bit late but I encountered the same problem myself, using the previous answers I found a solution that works for me, It reads in the text using StreamReaders default encoding, extracts the encoding used on that file and uses StreamWriter to write it back with the changes using the found Encoding. Also removes\reAdds the ReadOnly flag

        string file = "File to open";
        string text;
        Encoding encoding;
        string oldValue = "string to be replaced";
        string replacementValue = "New string";

        var attributes = File.GetAttributes(file);
        File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);

        using (StreamReader reader = new StreamReader(file, Encoding.Default))
        {
            text = reader.ReadToEnd();
            encoding = reader.CurrentEncoding;
            reader.Close();
        }

        bool changedValue = false;
        if (text.Contains(oldValue))
        {
            text = text.Replace(oldValue, replacementValue);
            changedValue = true;
        }

        if (changedValue)
        {
            using (StreamWriter write = new StreamWriter(file, false, encoding))
            {
                write.Write(text.ToString());
                write.Close();
            }
            File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
        }

The solution for all Germans => ÄÖÜäöüß

This function opens the file an determines the Encoding by the BOM.
If the BOM is missing the file will be interpreted as ANSI, but if there are UTF8 encoded German Umlaute in it, it will be detected as UTF8.

https://stackoverflow.com/a/69312696/9134997

继续阅读：.net .net-2.0 encoding

C# - Detecting encoding in a file, write change to file using the found encoding

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？