ConcurrentDictionary: super slow?

2023-01-20 05:08 问答作者：

I have a file I'm trying to parse, and this is how I'm doing it:

var definitions = new Dictionary<int, string>();

foreach (var line in new RirStatFile("delegated-lacnic-latest.txt"))
{
    for (var i = 0; i < line.Range; i ++)
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    };
};

new RirStatFile(...) returns an IEnumerable<RirStatFileLine>() with a .Count of 4,100 RirStatFileLine objects, where each RirStatFileLine has a .Range whose value is typically between 32768 and 1 million.

Running this as is demonstrated in 开发者_Go百科the snippet above takes about 45 seconds on this pitiful netbook of mine.

EDIT: Dual-core netbook.

Great place to use the new Parallel task library, right? That's what I thought, so I change the code to:

var definitions = new ConcurrentDictionary<int, string>();

Parallel.ForEach(new RirStatFile("delegated-lacnic-latest.txt"), line => 
{
    Parallel.For(0, line.Range, i =>
    {
        definitions[line.StartIpAddress + i] = line.Iso3166CountryCode;
    });
});

And guess what? The program takes 200 seconds!

What gives? Obviously I don't understand something here that's going on. Just for reference, here's RirStatFileLine:

public class RirStatFileLine
{
    public readonly string Iso3166CountryCode;
    public readonly int StartIpAddress;
    public readonly int Range;

    public RirStatFileLine(string line)
    {
        var segments = line.Split('|');

        // Line:         
        //    lacnic|BR|ipv4|143.54.0.0|65536|19900828|assigned
        // Translation:
        //    rir_name|ISO_countryCode|ipVersion|ipAddress|range|dateStamp|blah

        this.Iso3166CountryCode = segments[1];
        this.StartIpAddress =
         BitConverter.ToInt32(IPAddress.Parse(segments[3]).GetAddressBytes(), 0);
        this.Range = int.Parse(segments[4]);
    }
}

And RirStatFile:

public class RirStatFile : IEnumerable<RirStatFileLine>
{
    private const int headerLineLength = 4;

    private readonly IEnumerable<RirStatFileLine> lines;

    public RirStatFile(string filepath)
    {
        this.lines = File.ReadAllLines(filepath)
                         .Skip(RirStatFile.headerLineLength)
                         .Select(line => new RirStatFileLine(line)); 
    }

    public IEnumerator<RirStatFileLine> GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.lines.GetEnumerator();
    }
}

No surprise here. You are taking some very cheap operation (adding an entry to a dictionary) and wrapping it up in some expensive parallelization code.

You should parallelize computationally expensive code not trivial code.

Also, you are using ReadAllLines instead of ReadLines so there's no opportunity for any processing to happen overlapped with reading the file.

MSDN "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient."

The problem here is that your netbook only has a single CPU/core/hardware thread. Paralyzing wont help at all here.

继续阅读：loops parallel-processing

ConcurrentDictionary: super slow?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？