Need a deterministic algorithm to generate a resource domain prefix for a given path in an evenly distributed fashion
I need t开发者_Python百科o generate a resource domain prefix based on a given path and configured number of resource domains in a deterministic fashion with good distribution. For example, if you pass it the path "/script/site.js" it returns "res#", where '#' is an integer between 0 and the configured amount of resource domains.
Using C# 3.0.
So far have this:
var resourceDomainCount = 4;
var hasher = System.Security.Cryptography.SHA1.Create();
var paths = new [] {
"/App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg",
"/App_Themes.0.1.433.232/images/blahblah.jpg",
"/App_Themes.0.1.433.232/images/pagebg.gif",
"/App_Themes.0.1.433.232/site.css",
"/script/site.js",
"/App_Themes.0.1.433.232/images/different.jpg",
"/App_Themes.0.1.433.232/images/shadows.jpg",
"/Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0"
};
foreach (var path in paths) {
var pathHash = hasher.ComputeHash(Encoding.ASCII.GetBytes(path));
var singleByteHash = pathHash.Aggregate(0, (a, b) => a ^ b);
var random = new Random((int)singleByteHash);
var resourceDomainIndex = random.Next(0, resourceDomainCount);
(resourceDomainIndex + ": " + path).Dump();
}
Which produces the following:
3: /App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg
0: /App_Themes.0.1.433.232/images/blahblah.jpg 1: /App_Themes.0.1.433.232/images/pagebg.gif 1: /App_Themes.0.1.433.232/site.css 3: /script/site.js 1: /App_Themes.0.1.433.232/images/different.jpg 3: /App_Themes.0.1.433.232/images/shadows.jpg 1: /Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0 1: /Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0 0: /Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0 2: /Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0 1: /Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0 0: /Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0Not getting the distribution I want (there is only one instance of '2' there).
There are thousands of input strings that change all the time and the paths will be generated at runtime, e.g.: <a href="<%= GetPath("/script/site.js") %>">Link</a>
This is a lot simpler than hashing the string (well, presumably it's already a hash anyway) and seems to get a better distribution:
Math.Abs(path.GetHashCode()) % resourceDomainCount
Your sample is far, far too small. Randomness is generally like statistics. It's only borne out over a long period of time. Try rerunning your code over a 100 iterations, and see if each value comes up about the same amount of time. Then try again for 10,000 iteratons and notice that the actual number of results are closer percentage-wise. Then run it for 1 million iterations, and notice that it's even closer. It never needs to be exactly the same - otherwise it wouldn't really be random.
And if the results don't follow that pattern, then maybe you've got a problem with teh even distribution issue.
精彩评论