Generate unique hashes for django models
I want to use unique hashes for each model rather than ids.
I implemented the following function to use it across the board easily.
import random,hashlib
from base64 import urlsafe_b64encode
def set_unique_random_value(model_object,field_name='hash_uuid',length=5,use_sha=True,urlencode=False):
while 1:
uuid_number = str(random.random())[2:]
uuid = hashlib.sha256(uuid_number).hexdigest() if use_sha else uuid_number
uuid = uuid[:length]
if urlencode:
开发者_运维百科 uuid = urlsafe_b64encode(uuid)[:-1]
hash_id_dict = {field_name:uuid}
try:
model_object.__class__.objects.get(**hash_id_dict)
except model_object.__class__.DoesNotExist:
setattr(model_object,field_name,uuid)
return
I'm seeking feedback, how else could I do it? How can I improve it? What is good bad and ugly about it?
I do not like this bit:
uuid = uuid[:5]
In the best scenario (uuid are uniformly distributed) you will get a collision with probability greater than 0.5 after 1k of elements!
It is because of the birthday problem. In a brief it is proven that the probability of collision exceeds 0.5 when number of elements is larger than square root from number of possible labels.
You have 0xFFFFF=10^6 labels (different numbers) so after a 1000 of generated values you will start having collisions.
Even if you enlarge length to -1 you have still problem here:
str(random.random())[2:]
You will start having collisions after 3 * 10^6 (the same calculations follows).
I think your best bet is to use uuid that is more likely to be unique, here is an example
>>> import uuid
>>> uuid.uuid1().hex
'7e0e52d0386411df81ce001b631bdd31'
Update If you do not trust math just run the following sample to see the collision:
>>> len(set(hashlib.sha256(str(i)).hexdigest()[:5] for i in range(0,2000)))
1999 # it should obviously print 2000 if there wasn't any collision
The ugly:
import random
From the documentation:
This module implements pseudo-random number generators for various distributions.
If anything, please use os.urandom
Return a string of n random bytes suitable for cryptographic use.
This is how I use it in my models:
import os
from binascii import hexlify
def _createId():
return hexlify(os.urandom(16))
class Book(models.Model):
id_book = models.CharField(max_length=32, primary_key=True, default=_createId)
Django 1.8+ has a built-in UUIDField. Here's the suggested implementation, using the standard library's uuid module, from the docs:
import uuid
from django.db import models
class MyUUIDModel(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
# other fields
For older django versions you can use the django-uuidfield package.
Use your database engine's UUID support instead of making up your own hash. Almost everything beyond SQLite supports them, so there's little reason to not use them.
加载中,请稍侯......
精彩评论