Memcache key generation strategy

2022-12-27 04:06 问答作者：

Given function f1 which receives n String arguments, what would be considered better ,in terms of runtime performance, random key generation strategy for memcache?

Our Memcache client does internal md5sum hashing on the keys it gets:

   public class MemcacheClient {  
       public Object get(String key) {
            String md5 = Md5sum.md5(key)
            // Talk to memcached to get the Serialization... 
            return memcached(md5);
       }
   }

My usage scenarios are:

First option

    public static String f1(String s1, String s2, String s3, String s4) {
         String key = s1 +  s2 + s3 + s4;
         return get(key);
    }

Second option

    /**
     * Calculate hash from Strings
     *
     * @param objects vararg list of String's
     *
     * @return calculated md5sum hash
     */
    public static String stringHash(Object... strings) {
        if(strings == null) 
            throw 开发者_高级运维new NullPointerException("D'oh! Can't calculate hash for null");

        MD5 md5sum = new MD5();

//      if(prevHash != null)
//          md5sum.Update(prevHash);

        for(int i = 0; i < strings.length; i++) {
            if(strings[i] != null) {
                md5sum.Update("_"); 
                md5sum.Update(strings[i].toString()); // Convert to String...
                md5sum.Update("_");

            } else {
                // If object is null, allow minimum entropy  by hashing it's position
                md5sum.Update("_");
                md5sum.Update(i);
                md5sum.Update("_");
            }
        }

        return md5sum.asHex();
    }


    public static String f1(String s1, String s2, String s3, String s4) {
         String key = stringHash(s1, s2, s3, s4);
         return get(key);
    }

Note that the possible problem with the second option is that we are doing second md5sum (in the memcache client) on an already md5sum'ed digest result.

Thanks for reading, Maxim.

-- Edit Used MD5 utility source

"Better" in what sense? Why would you think the second option is "better"? It does more string concatentations, more MD5 hashes and just generally seems much less efficient than the first...

Just nitpicking, but you probably don't want random key generation, the key generation should be deterministic, but should generate a uniform distribution in the key space.

If you consider only accidental collisions, then the first approach is almost fine. You should prefix the strings with their length so you don't get collisions when a substring moves from one param to another. Given md5's pretty good avalanche properties that will ensure that accidental collisions are rare enough to be ignored.

But be careful with MD5 if you process user input, it has known collision attacks. If an untrusted user can pick some arbitrary bytes for the function parameters and returning a wrong result can have security implications, then you have a security hole. For instance if you use this to cache authorization info, an attacker could work out two sets of parameters that hash to a single value. One would access something public and the other accesses a protected service. Now just request authorization with the first set, get the authorization cached and then access the protected service with the other set, receiving a green light from the cached authorization.

继续阅读：algorithm memcached

Memcache key generation strategy

First option

Second option

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

First option

Second option

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？