Preserving text file encoding (ASCII, UTF-8, UTF-16)
I have a simple text file processing tool written in C#, the skeleton looks like this:
using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding
using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader
{
string line;
while ((line = reader.ReadLine()) != null)
{
// do something with line
writer.WriteLine(line);
}
}
Most of the files it operates on are ASCII files, with the occasio开发者_如何转开发nal UTF-16 here and there. I want to preserve the file encoding, the newly created file should have the same encoding as the file being read - that's why I open StreamWriter with the CurrentEncoding of reader.
My problem is some of the UTF-16 files lack preamble and after the StreamReader is opened it has CurrentEncoding set to UTF-8, which causes the writer to be opened in UTF-8 mode. When debugging I can see the reader changes its CurrentEncoding property to UTF-16 after the first call to ReadLine, but by that time the writer is already opened.
I can think of a few workarounds (opening the writer later or going over the source file twice - the first one just to detect encoding), but thought I'd ask experts for opinion first. Note that I'm not concerned with code pages of the ASCII files, I'm only concerned with ASCII/UTF-8/UTF-16 encodings.
I'd try doing a reader.Peek()
before opening the writer - that ought to be sufficient in your case, I think.
精彩评论