开发者

ConcurrentDictionary: super slow?

I have a file I'm trying to parse, and this is how I'm doing it:

var definitions = new Dictionary<int, string>();

foreach (var line in new RirStatFile("delegated-lacnic-latest.txt"))
{
    for (var i = 0; i < line.Range; i ++)
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    };
};

new RirStatFile(...) returns an IEnumerable<RirStatFileLine>() with a .Count of 4,100 RirStatFileLine objects, where each RirStatFileLine has a .Range whose value is typically between 32768 and 1 million.

Running this as is demonstrated in 开发者_Go百科the snippet above takes about 45 seconds on this pitiful netbook of mine.

EDIT: Dual-core netbook.

Great place to use the new Parallel task library, right? That's what I thought, so I change the code to:

var definitions = new ConcurrentDictionary<int, string>();

Parallel.ForEach(new RirStatFile("delegated-lacnic-latest.txt"), line => 
{
    Parallel.For(0, line.Range, i =>
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    });
});

And guess what? The program takes 200 seconds!

What gives? Obviously I don't understand something here that's going on. Just for reference, here's RirStatFileLine:

public class RirStatFileLine
{
    public readonly string Iso3166CountryCode;
    public readonly int StartIpAddress;
    public readonly int Range;

    public RirStatFileLine(string line)
    {
        var segments = line.Split('|');

        // Line:         
        //    lacnic|BR|ipv4|143.54.0.0|65536|19900828|assigned
        // Translation:
        //    rir_name|ISO_countryCode|ipVersion|ipAddress|range|dateStamp|blah

        this.Iso3166CountryCode = segments[1];
        this.StartIpAddress =
         BitConverter.ToInt32(IPAddress.Parse(segments[3]).GetAddressBytes(), 0);
        this.Range = int.Parse(segments[4]);
    }
}

And RirStatFile:

public class RirStatFile : IEnumerable<RirStatFileLine>
{
    private const int headerLineLength = 4;

    private readonly IEnumerable<RirStatFileLine> lines;

    public RirStatFile(string filepath)
    {
        this.lines = File.ReadAllLines(filepath)
                         .Skip(RirStatFile.headerLineLength)
                         .Select(line => new RirStatFileLine(line)); 
    }

    public IEnumerator<RirStatFileLine> GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }
}


No surprise here. You are taking some very cheap operation (adding an entry to a dictionary) and wrapping it up in some expensive parallelization code.

You should parallelize computationally expensive code not trivial code.

Also, you are using ReadAllLines instead of ReadLines so there's no opportunity for any processing to happen overlapped with reading the file.

MSDN "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient."


The problem here is that your netbook only has a single CPU/core/hardware thread. Paralyzing wont help at all here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜