开发者

.NET : StreamReader does not recognize ° characters

I am trying to run a RegEx to locate degree characters (\u00B0|\u00BA degrees in addition to locating the other form of ' --> \u00B4). I am reading latitude and longitude DMS coordinates like this one: 12º30'23.256547"S

The problem is with the way I am reading the file as I can manually inject a string like the one below (format is latitude, longitude, description):

const string myTestString = @"12º30'23.256547""S, 12º30'23.256547""W, Somewhere";

and my regex is matching 开发者_StackOverflow社区as expected - I can also see the º values where, when I am using the streamreader, I see a � for all unrecognized characters (the º symbol being included as one of those unrecognized characters)

I've tried:

            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.UTF8);
            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.Unicode);
            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.BigEndianUnicode);

in addition to the default ASCII.

Either way I read the file, I end up with these special characters. Any advice would be greatly appreciated!!


You've tried various encodings... but presumably not the right one. You shouldn't just be guessing at encodings - find out what encoding it's really using, and use that. StreamReader itself is absolutely fine. It can deal with any encoding you give it, but it does have to match the encoding used when writing the file out.

Where does the file come from? What has written it out?

If it was written out with Notepad, it may well be using Encoding.Default, which is the system's default encoding (i.e. it will vary from machine to machine). If at all possible, change whatever is creating the file to use a single standard encoding - personally I'm a big fan of UTF-8.


You need to identify what encoding the file was saved in, and use that when you read it with your streamreader.

If it is created using a regular texteditor I'm guessing the default encoding is either Windows-1252 or ISO-8859-1.

The degree symbol is 0xBA in ISO-8859-1 and goes outside of the 7bit ASCII table. I don't know how the Encoding.ASCII interprets it.

Otherwise, it might be easier to just make sure to save the file as UTF-8 if you have that possibility.

The reason that it works when you define the string in code is because .NET will always work with strings with it's internal encoding (UCS-2?), so what StreamReader do is convert the bytes it is reading from the file into the internal encoding using the encoding that you specify when you create the StreamReader.


You can open your file being read in an editor like Notepad++ to see the Encoding type of the file and change it to UTF-8. Then reading as you are doing 'var sr = new StreamReader(dlg.File.OpenRead(), Encoding.UTF8);' will work. I could read degree symbol by doing this

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜