Is it possible to reserve the size of a hash table in Perl?

2023-03-17 23:33 问答作者：

I'm creating a hash table in Perl, of an unknown size.

The hash table maps a string to a reference to an array.

The main loop of my application adds 5-10 elements to the hash table in each iteration. As the hash table fills up, things start to slow down drastically. From observation, when there are ~50k keys in the hash table, adding keys slows by a magnitude of 20x.

I postulate that the hash table has become full, and collisions are occurring. I would like to 'reserve' the size of the hash table, but I'm unsure how.

The hash in question is hNgramsToWord.

For each word, the 1-len-grams of that word are added as keys, with a reference to an array of words which contain that ngram.

For example:

AddToNgramHash("Hello");

[h, e, l, l, o, he, el, ll, lo, hel, llo, hell, ello, hello ] are all added as keys, mapping to "hello"

sub AddToNgramHash($) {
    my $word = shift;
    my @aNgrams = MakeNgrams($word);
    foreach my $ngram (@aNgrams) {
       my @aWords;
       if(defined($hNgramsToWord{$ngram})) {
          @aWords = @{$hNgramsToWord{$ngram}};
       }
       push (@aWords, $word);
       $hNgramsToWord{$ngram} = \@aWords;
    }
    return scalar keys %hNgramsToWord;
}

sub MakeNgrams($) {
    my $word = shift;
    my $len = length($word);
    my @aNgrams;
    for(1..$len) {
       m开发者_Go百科y $ngs = $_;
          for(0..$len-$ngs) {
           my $ngram = substr($word, $_, $ngs);
           push (@aNgrams, $ngram);
       }
    }
    return @aNgrams;
}

You can set the number of buckets for a hash like so:

keys(%hash) = 128;

The number will be rounded up to a power of two.

That said, it is very unlikely that the slowdown you see is due to excess hash collisions, since Perl will dynamically expand the number of buckets as needed. And since 5.8.2, it will even detect pathological data that results in a given bucket being overused and reconfigure the hashing function for that hash.

Show your code, and we will likely be able to help find the real problem.

A demonstration of a large number of hash keys (don't let it continue till you are out of memory...):

use strict;
use warnings;
my $start = time();
my %hash;
$SIG{ALRM} = sub {
    alarm 1;
    printf(
        "%.0f keys/s; %d keys, %s buckets used\n",
        keys(%hash) / (time() - $start),
        scalar(keys(%hash)),
        scalar(%hash)
    );
};
alarm 1;
$hash{rand()}++ while 1;

Once there are a LOT of keys, you will notice a perceptible slowdown when it needs to expand the number of buckets, but it still maintains a pretty even pace.

Looking at your code, the more words are loaded, the more work it has to do for each word.

You can fix it by changing this:

   my @aWords;
   if(defined($hNgramsToWord{$ngram})) {
      @aWords = @{$hNgramsToWord{$ngram}};
   }
   push (@aWords, $word);
   $hNgramsToWord{$ngram} = \@aWords;

to this:

   push @{ $hNgramsToWord{$ngram} }, $word;

No need to copy the array twice. No need to check if the ngram already has an entry - it will autovivify an array reference for you.

继续阅读：hash hashmap hashtable perl

Is it possible to reserve the size of a hash table in Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？