Consistent String#hash based only on the string's content
GOAL: Map every URL handled by a server to 0, 1, 2, or 3, distributing as uniformly as possible.
While the documentation for ruby's String#hash method says it will "return a hash based on the string‘s length an开发者_高级运维d content," this clearly isn't the whole story. A given string's hash is not consistent across invocations of the interpreter:
$ irb
ruby-1.9.2-p180 :001 > "foo".hash
=> 360517580588231756
ruby-1.9.2-p180 :002 > ^D
$ irb
ruby-1.9.2-p180 :001 > "foo".hash
=> -2716152678666510148
This means a particular string's hash value may differ across, say, servers. Rails uses String#hash
internally to map a URL path to one of four asset hosts (if the app's asset_host is so configured), but this feature is a lot less efficient than it could be because of the cross-machine inconsistencies; different servers may map the same URL to different asset hosts, reducing the effectiveness of caches, clouding skies, cooling cups of tea prematurely, besmirching the reputations of otherwise fine programmers.
Can you suggest an alternate hash function that could effectively and speedily distribute hashes across a typical app's URL space, preferably one that produces a Fixnum since, in the end, I'll want to map it into one of four asset hosts?
there are lot of such functionality in ruby's digest module: http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html
simple example:
require 'digest/sha1'
Digest::SHA1.hexdigest("some string")
The easiest (and consistent) way may be this (and it's fast):
"https://www.example.com/abc/def/123?hij=345".sum % 4
That will always produce an integer 0 - 3, is quite fast, and should be fairly well distributed (though I haven't actually run tests on distribution).
There is tiny library xxHash:
XXhash.xxh32('qwe') #=> 2396643526
XXhash.xxh64('qwe') #=> 9343136760830690622
Maybe it will have more collisions but it is 10x faster than SHA1:
Benchmark.bm do |x|
n = 100_000
str = 'qweqweqwe'
x.report('xxhash32') { n.times { XXhash.xxh32(str) } }
x.report('xxhash64') { n.times { XXhash.xxh64(str) } }
x.report('hexadigest') { n.times { Digest::SHA1.hexdigest(str) } }
end;1
# user system total real
# xxhash32 0.020000 0.000000 0.020000 ( 0.021948)
# xxhash64 0.040000 0.000000 0.040000 ( 0.036340)
# hexadigest 0.240000 0.030000 0.270000 ( 0.276443)
You can try to_i(36).
"Hash me please :(".to_i(36)
=> 807137
精彩评论