开发者

Filestream prepends junk characters while read

I am reading a simple text file which contains single line using filestream class. But it seems filestream.read prepends some junk character in the beginning.

Below the code.

using (var _fs = File.Open开发者_Go百科(_idFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
{
     byte[] b = new byte[_fs.Length];
     UTF8Encoding temp = new UTF8Encoding(true);
     while (_fs.Read(b, 0, b.Length) > 0)
     {
         Console.WriteLine(temp.GetString(b));
         Console.WriteLine(ASCIIEncoding.ASCII.GetString(b));


     }
 }

for example: My data in text file is just "sample". But the above code returns

  "?sample" and
  "???sample"

Whats the reason?? is it start of the file indicator? is there a way to read only my actual content??


The byte order mark(BOM) consists of the Unicode char 0xFEFF and is used to mark a file with the encoding used for it.

So if you correctly decode the file as UTF8 you get that character as first char of your string. If you incorrectly decode it as ANSI you get 3 chars, since the UTF8 encoding of 0xFEFF is the byte sequence "EF BB BF" which is 3 bytes.

But your whole code can be replaced with

File.ReadAllText(fileName,Encoding.UTF8)

and that should remove the BOM too. Or you leave out the encoding parameter and let the function autodetect the encoding(for which it uses the BOM)


Could be the BOM - a.k.a byte order mark.


You are reading the BOM from the stream. If you are reading text, try using a StreamReader which will handle this automatically.


Try instead

using (StreamReader sr = new StreamReader(File.Open(path),Encoding.UTF8))

It will definitely strip you the BOM

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜