开发者

Size difference when reading UTF8 encoded file

I'm trying to read a UTF8 encoded file (.torrent). In the file there is a 'piec开发者_StackOverflowes' section. Directly following that is the length of the text that contains a sequence of SHA1 hashes. The file reports a length (say 130100) to read, but when reading I end up going passed EOF.

I'm not sure why this is happening. The files are good (I've tested them with existing torrent clients and I've tried a number of them with consistent results) and I'm reading them with this:

string contents = string.Empty;
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read)
{
    using (StreamReader reader = new StreamReader(fs, Encoding.UTF8))
    {
        contents = reader.ReadToEnd();
    }
}

parse(contents);

However, this obviously isn't working. Am I reading the file wrong, or am I storing it in a string incorrectly before trying to parse it? It seems to only fault when it reads characters outside of the normal range of readable strings.


BitTorrent files aren't UTF-8-encoded. Some or all of the filenames in the files->path/name property may be UTF-8 encoded strings, but the file as a whole is purely binary, and the contents of the pieces property is a binary string containing the hashes. It makes no sense to try to read a .torrent with a TextReader.

The format under which BitTorrent files are stored is a simple structured-value serialisation known as bencode. You will want to use a proper bencode parser to extract information from a .torrent file. It's not difficult to write one (after all, you only get four datatypes), or see theory's libraries list for a couple of existing .NET libraries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜