Is COMB GUID a good idea with Rails 3.1 if I use GUIDs for primary keys?
I'm using Rails 3.1 with PostgreSQL 8.4. Let's assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use special sequential GUIDs. One approach to sequential GUIDs is the COMBination GUID that substitutes a 6-byte timestamp for the MAC address portion at the end of the GUID. This has some mainstream adoption: COMBs are available natively in NHibernate (NHibernate/Id/GuidCombGenerator.cs).
I think I've figured out how to create COMB GUIDs in Rails (with the help of the UUIDTools 2.1.2 gem), but it leaves some unanswered qu开发者_如何学Cestions:
- Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
- Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
- Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?
Thanks for your thoughts.
create_contacts.rb
migration
class CreateContacts < ActiveRecord::Migration
def up
create_table :contacts, :id => false do |t|
t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
t.string :first_name
t.string :last_name
t.string :email
t.timestamps
end
execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
end
# Can't use reversible migration because it will try to run 'execute' again
def down
drop_table :contacts # also drops primary key
end
end
/app/models/contact.rb
class Contact < ActiveRecord::Base
require 'uuid_helper' #rails 3 does not autoload from lib/*
include UUIDHelper
set_primary_key :id
end
/lib/uuid_tools.rb
require 'uuidtools'
module UUIDHelper
def self.included(base)
base.class_eval do
include InstanceMethods
attr_readonly :id # writable only on a new record
before_create :set_uuid
end
end
module InstanceMethods
private
def set_uuid
# MS SQL syntax: CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)
# Get current Time object
utc_timestamp = Time.now.utc
# Convert to integer with milliseconds: (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)
# Format as hex, minimum of 12 digits, with leading zero. Note that 12 hex digits handles to year 10889 (*).
utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int
# If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
# Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")
# Generate Version 1 UUID (see RFC 4122).
comb_guid = UUIDTools::UUID.timestamp_create().to_s
# Assign generted COMBination GUID to .id
self.id = comb_guid
# (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
# If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then
# Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
end
end
end
- Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
Yes, it's to be expected. But if you're going to use the COMB strategy that won't happen. The rows will be always in order(that's not entirely true, but bear with me).
Also, the performance between native pgsql UUID vs VARCHAR is not all that different. Another point to consider.
- Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
In my test I've found that UUID1(RFC 4122) is sequential, there's already a timestamp added in the generated uuid. But yes, adding a timestamp in the last 6 bytes will reassure that ordering. That's what I did anyway, because apparently the timestamp already present is not guarantee of order. More about COMB here
- Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?
I'm not using rails, but I'll show you how I did it in django:
import uuid, time
def uuid1_comb(obj):
return uuid.uuid1(node=int(time.time() * 1000))
Where node
is a 48-bit positive integer identifying the hardware address.
About your implementation, one of the main advantages of using uuid's is that you can safely generate them outside the database, so, using a helper class is one valid way to do it. You can always use an external service for uuid generation like snowflake, but it may be premature optimizacion at this point.
精彩评论