开发者

converting files to utf-8 but character destroyed

foreach (var f in new DirectoryInfo(@"...").GetFiles("*.cs", SearchOption.AllDirectories)) {
  string s = File.ReadAllText(f.FullName);
  File.WriteAllText (f.FullName, s, Encoding.UTF8);
}

but when converting characters get destroyed. How can i prevent that čćžšđ will not be destroyed?

Manually i fisrt copy text then convert file to utf-8 and paste text back and characters are ok. But h开发者_JS百科ere i have more than 200 files and this is too much.


When using File.ReadAllText, ensure you are reading the files with the correct encoding. For example with ASCII files:

string s = File.ReadAllText(f.FullName, Encoding.ASCII);

The value would get "destroyed" during the read, if you are using the incorrect encoding.

You can create a new Encoding with the correct code page using the code page ID (see this page for IDs):

var myEncoding = new Encoding(10081); // for Turkish (Mac)
string s = File.ReadAllText(f.FullName, myEncoding);


You'll have to use the File.ReadAllText(string, Encoding) overload. What that encoding should be is unguessable from your question, but not likely utf-8 as the ReadAllText(string) overload will use. Try this:

string s = File.ReadAllText(f.FullName, Encoding.Default);

which uses your machine's default code page. If the source code files were not created on your machine then find out what the code page was for the machine where the files came from.


Convert all file in directory ansi to utf-8

Github Project

 String[] files = System.IO.Directory.GetFiles(fbd.SelectedPath, "*.txt" , System.IO.SearchOption.AllDirectories);

            foreach (var file in files)
            {

                    byte[] ansiBytes;
                    using (var reader = new System.IO.StreamReader(file, true))
                    {
                             ansiBytes = File.ReadAllBytes(file);
                    }
                    if (!IsUTF8Bytes(ansiBytes))
                    {
                        System.IO.File.Move(file, file + "_");
                        var utf8String = Encoding.Default.GetString(ansiBytes);
                        File.WriteAllText(file, utf8String);
                    }

            }







    private static bool IsUTF8Bytes(byte[] data)
    {
        int charByteCounter = 1;
        byte curByte;
        for (int i = 0; i < data.Length; i++)
        {
            curByte = data[i];
            if (charByteCounter == 1)
            {
                if (curByte >= 0x80)
                {
                    while (((curByte <<= 1) & 0x80) != 0)
                    {
                        charByteCounter++;
                    }

                    if (charByteCounter == 1 || charByteCounter > 6)
                    {
                        return false;
                    }
                }
            }
            else
            {
                if ((curByte & 0xC0) != 0x80)
                {
                    return false;
                }
                charByteCounter--;
            }
        }
        if (charByteCounter > 1)
        {
            throw new Exception("Error byte format");
        }
        return true;
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜