开发者

Memcache key generation strategy

Given function f1 which receives n String arguments, what would be considered better ,in terms of runtime performance, random key generation strategy for memcache?

Our Memcache client does internal md5sum hashing on the keys it gets:

   public class MemcacheClient {  
       public Object get(String key) {
            String md5 = Md5sum.md5(key)
            // Talk to memcached to get the Serialization... 
            return memcached(md5);
       }
   }

My usage scenarios are:

First option

    public static String f1(String s1, String s2, String s3, String s4) {
         String key = s1 +  s2 + s3 + s4;
         return get(key);
    }

Second option

    /**
     * Calculate hash from Strings
     *
     * @param objects vararg list of String's
     *
     * @return calculated md5sum hash
     */
    public static String stringHash(Object... strings) {
        if(strings == null) 
            throw 开发者_高级运维new NullPointerException("D'oh! Can't calculate hash for null");

        MD5 md5sum = new MD5();

//      if(prevHash != null)
//          md5sum.Update(prevHash);

        for(int i = 0; i < strings.length; i++) {
            if(strings[i] != null) {
                md5sum.Update("_"); 
                md5sum.Update(strings[i].toString()); // Convert to String...
                md5sum.Update("_");

            } else {
                // If object is null, allow minimum entropy  by hashing it's position
                md5sum.Update("_");
                md5sum.Update(i);
                md5sum.Update("_");
            }
        }

        return md5sum.asHex();
    }


    public static String f1(String s1, String s2, String s3, String s4) {
         String key = stringHash(s1, s2, s3, s4);
         return get(key);
    }

Note that the possible problem with the second option is that we are doing second md5sum (in the memcache client) on an already md5sum'ed digest result.

Thanks for reading, Maxim.

-- Edit Used MD5 utility source


"Better" in what sense? Why would you think the second option is "better"? It does more string concatentations, more MD5 hashes and just generally seems much less efficient than the first...


Just nitpicking, but you probably don't want random key generation, the key generation should be deterministic, but should generate a uniform distribution in the key space.

If you consider only accidental collisions, then the first approach is almost fine. You should prefix the strings with their length so you don't get collisions when a substring moves from one param to another. Given md5's pretty good avalanche properties that will ensure that accidental collisions are rare enough to be ignored.

But be careful with MD5 if you process user input, it has known collision attacks. If an untrusted user can pick some arbitrary bytes for the function parameters and returning a wrong result can have security implications, then you have a security hole. For instance if you use this to cache authorization info, an attacker could work out two sets of parameters that hash to a single value. One would access something public and the other accesses a protected service. Now just request authorization with the first set, get the authorization cached and then access the protected service with the other set, receiving a green light from the cached authorization.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜