Size difference when reading UTF8 encoded file
I'm trying to read a UTF8 encoded file (.torrent). In the file there is a 'piec开发者_StackOverflowes' section. Directly following that is the length of the text that contains a sequence of SHA1 hashes. The file reports a length (say 130100) to read, but when reading I end up going passed EOF.
I'm not sure why this is happening. The files are good (I've tested them with existing torrent clients and I've tried a number of them with consistent results) and I'm reading them with this:
string contents = string.Empty;
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read)
{
using (StreamReader reader = new StreamReader(fs, Encoding.UTF8))
{
contents = reader.ReadToEnd();
}
}
parse(contents);
However, this obviously isn't working. Am I reading the file wrong, or am I storing it in a string incorrectly before trying to parse it? It seems to only fault when it reads characters outside of the normal range of readable strings.
BitTorrent files aren't UTF-8-encoded. Some or all of the filenames in the files->path
/name
property may be UTF-8 encoded strings, but the file as a whole is purely binary, and the contents of the pieces
property is a binary string containing the hashes. It makes no sense to try to read a .torrent with a TextReader
.
The format under which BitTorrent files are stored is a simple structured-value serialisation known as bencode. You will want to use a proper bencode parser to extract information from a .torrent file. It's not difficult to write one (after all, you only get four datatypes), or see theory's libraries list for a couple of existing .NET libraries.
精彩评论