Does python-memcached support consistent hashing & binary protocol?
Python-memcached is the official supported memcached driver for Django.
Does it support
- Consistent hashing 开发者_StackOverflow
- Binary protocol
If it does, how do I use those features within Django? I couldn't find any documentation.
Looking at the _get_server
method on python-memcached v1.45, it seems it doesn't use consistent hashing, but a simple hash % len(buckets)
.
Same goes for binary protocol, python-memcache uses, as far as I can see in the source, only text commands.
You might be able to use this: http://amix.dk/blog/post/19370
It encapsulates python-memcache's Client class so keys are distributed using consistent hashing.
EDIT- I'm digging in python-memcached
1.4.5 source code, and it looks like it might actually support consistent hashing.
Relevant code:
from binascii import crc32 # zlib version is not cross-platform
def cmemcache_hash(key):
return((((crc32(key) & 0xffffffff) >> 16) & 0x7fff) or 1)
serverHashFunction = cmemcache_hash
-- SNIP --
def _get_server(self, key):
if isinstance(key, tuple):
serverhash, key = key
else:
serverhash = serverHashFunction(key)
for i in range(Client._SERVER_RETRIES):
server = self.buckets[serverhash % len(self.buckets)]
if server.connect():
#print "(using server %s)" % server,
return server, key
serverhash = serverHashFunction(str(serverhash) + str(i))
return None, None
Based on this code, it looks like it does implement the algorithm, unless cmemcache_hash
is not a meaningful name and that is not the real algorithm. (the now retired cmemcache does consistent hashing)
But I think the OP is referring to more "resilient" consistent hashing, e.g. libketama. I don't think there's a drop in solution out there for that out there, looks like you need to roll up your sleeves compile/install a more advanced memcached lib like pylibmc, and write a custom Django backend that uses that instead of python-memcached.
Anyhow, in either case, some remapping of keys will occur when you add/remove buckets to the pool (even with libketama, just less than with the other algorithms)
If you want a plug-and-play solution for django, use django-memcached-hashring
: https://github.com/jezdez/django-memcached-hashring.
It is an adapter around django.core.cache.backends.memcached.MemcachedCache
and the hash_ring
library.
Please check this sample python implementation of consistent hashing. consistent-hashingpython-memcachedcontinnum-circle
implementation principal : imagine a continnum circle with a number of replicated server points spread across it. When we add a new server, 1/n of the total cache keys will be lost
'''consistent_hashing.py is a simple demonstration of consistent
hashing.'''
import bisect
import hashlib
class ConsistentHash:
'''
To imagine it is like a continnum circle with a number of replicated
server points spread across it. When we add a new server, 1/n of the total
cache keys will be lost.
consistentHash(n,r) creates a consistent hash object for a
cluster of size n, using r replicas.
It has three attributes. num_machines and num_replics are
self-explanatory. hash_tuples is a list of tuples (j,k,hash),
where j ranges over machine numbers (0...n-1), k ranges over
replicas (0...r-1), and hash is the corresponding hash value,
in the range [0,1). The tuples are sorted by increasing hash
value.
The class has a single instance method, get_machine(key), which
returns the number of the machine to which key should be
mapped.'''
def __init__(self,replicas=1):
self.num_replicas = replicas
def setup_servers(self,servers=None):
hash_tuples = [(index,k,my_hash(str(index)+"_"+str(k))) \
for index,server in enumerate(servers)
for k in range(int(self.num_replicas) * int(server.weight)) ]
self.hash_tuples=self.sort(hash_tuples);
def sort(self,hash_tuples):
'''Sort the hash tuples based on just the hash values '''
hash_tuples.sort(lambda x,y: cmp(x[2],y[2]))
return hash_tuples
def add_machine(self,server,siz):
'''This mathod adds a new machine. Then it updates the server hash
in the continuum circle '''
newPoints=[(siz,k,my_hash(str(siz)+"_"+str(k))) \
for k in range(self.num_replicas*server.weight)]
self.hash_tuples.extend(newPoints)
self.hash_tuples=self.sort(self.hash_tuples);
def get_machine(self,key):
'''Returns the number of the machine which key gets sent to.'''
h = my_hash(key)
# edge case where we cycle past hash value of 1 and back to 0.
if h > self.hash_tuples[-1][2]: return self.hash_tuples[0][0]
hash_values = map(lambda x: x[2],self.hash_tuples)
index = bisect.bisect_left(hash_values,h)
return self.hash_tuples[index][0]
def my_hash(key):
'''my_hash(key) returns a hash in the range [0,1).'''
return (int(hashlib.md5(key).hexdigest(),16) % 1000000)/1000000.0
Now vbucket is coming for resolving consistent hashing with least impact in cache misses.
I have used Consistent hashing algorithm. The lost keys are 1/n of the total number of keys. This means the successful key fetch will be 6/7 *100 around 85%. here
精彩评论