c#, Excel + csv: how to get the correct encoding?

2023-01-09 15:17 问答作者：

I've been trying this for quite a while now, but can't figure it out. I'm trying to export data to Excel via a *.csv file. It works great so far, but I have some encoding problems when opening the files in Excel.

(original string on the left, EXCEL result on the right):

Messwert(µm / m) ==> Messwert(Âµm / m)

Dümme Mässöng ==> DÃ¼mme MÃ¤ssÃ¶ng

Notepad++ tells me that the file is encoded "ANSI as UTF8"(WTF?)

So here are different ways I tried to get a valid result: obvious implementation:

tWriter.Write(";Messwert(µm /m)");

more sophisticated one (tried probably a dozen or more encoding combinations:)

tWriter.Write(Encoding.Default.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
tWriter.Write(Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));

and so on

Whole sourc开发者_如何学Goe code for the method creating the data:

    MemoryStream tStream = new MemoryStream();
    StreamWriter tWriter = new StreamWriter(tStream);
    tWriter.Write("\uFEFF");

    tWriter.WriteLine(string.Format("{0}", aMeasurement.Name));
    tWriter.WriteLine(aMeasurement.Comment);
    tWriter.WriteLine();
    tWriter.WriteLine("Zeit in Minuten;Messwert(µm / m)");

    TimeSpan tSpan;
    foreach (IMeasuringPoint tPoint in aMeasurement)
    {
        tSpan = new TimeSpan(tPoint.Time - aMeasurement[0].Time);
        tWriter.WriteLine(string.Format("{0};{1};", (int)tSpan.TotalMinutes, getMPString(tPoint)));
    }

    tWriter.Flush();
    return tStream;

Generated CSV file:

Dümme Mössäng
Testmessung die erste

Zeit in Minuten;Messwert(µm / m)
0;-703;
0;-381;
1;1039;
1;1045;
2;1457;
2;1045;

This worked perfect for me:

private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252

    this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));

CSV encoding issues (Microsoft Excel)

try the following:

using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
  var preamble = Encoding.UTF8.GetPreamble();
  sw.Write(preamble, 0, preamble.Length);
  var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");
  sw.Write(data, 0, data.Length);
}

It writes the proper UTF8 preamble to the file before writing the UTF8 encoded CSV.

This solution is written up as a fix for a Java application however you should be able to do something similar in C#. You may also want to look at the documentation on the StreamWriter class, in the remarks it refers to the Byte Order Mark (BOM).

"ANSI as UTF8"(WTF?)

NotePad++ is probably correct. The encoding is UTF8 (i.e., correct Unicode header), but only contains ANSI data (i.e., é is not encoded in correct UTF8 way, which would mean two bytes).

Or: it is the other way around. It is ANSI (no file header BOM), but the encoding of the individual characters is, or looks like, UTF8. This would explain the ü and other characters expanding in more than one other character. You can fix this by forcing the file to be read as Unicode.

If it's possible to post (part of) your CSV, we may be able to help fixing it at the source.

Edit

Now that we've seen your code: can you remove the StreamWriter and replace it with a TextWriter? Also, remove the hand-encoding of the BOM, it is not necessary. When you create a TextWriter, you can specify the encoding (don't use ASCII, try UTF8).

Trevor Germain's helped me to save in the correct encoded format

using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
    var preamble = Encoding.UTF8.GetPreamble();  
    sw.Write(preamble, 0, preamble.Length);  
    var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");  
    sw.Write(data, 0, data.Length);
}

I'd suggest you open up the text file in a hex editor, and see what it really is. The BOM for UTF-16 is 0xFEFF, which the writing code is apparently writing to the stream - but the rest of the writing doesn't specify an encoding to use - it would use the default encoding of the StreamWriter, which is UTF-8. There appears to be a mix up of encodings.

When you pop open the file in hex view, if you see lots of 0x00 between the characters, you're working with UTF-16, which is Encoding.Unicode in C#. If there are no 0x00 between chars, the encoding is probably UTF-8.

If the latter case, just fix up the BOM to be EF BB BF rather than FE FF, and read normally with UTF-8 encoding.

For my scenario using StreamWriter I found explicitly passing UTF8 encoding to the StreamWriter enabled excel to read the file using the correct encoding.

See this answer for more details: https://stackoverflow.com/a/22306937/999048

继续阅读：csv encoding excel export

c#, Excel + csv: how to get the correct encoding?

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？