Pros/Cons of and ways to implement globally unique identifier in relational database?
Regarding the first part of my question: I was recently asking myself what are the benefits and trade-offs of having a unique identifier for certain tables in a relational database. Just as an example, the Facebook (FB) Graph API allows to fetch different types of objects such as "Users", "Events", "Pages", etc. using the same URL, e.g https://domain/251906384206 returns an object of type "Event" whereas https://domain/195466193802264 returns an object of type "Group".
What is the benefit of this approach compared to providing a less "generic" API, one which would be used in this way: https://domain/event/251906384206 or https://domain/group/195466193802264. In this case, a similar identifier might be used for different objects types because each object type has it's identifier scope.
Regarding the second part of the question: What are the options for implementing a globally unique identifer?
Two options that come to my mind are:
Using an inheritance-based approach (table-per-class, single table, etc.). Assuming a table-per-class approach is used (super table contains unique identifier as primary key only, sub table representing object type contains same indentifier as super table and additional data), joins are required between super and sub table which seems to scale badly because the super table becomes a bottleneck?
Providing a table with 3 columns, containing
- unique identifier,
- object type specifc primar key, and
- table name.
Additional tables per object type containing a column referencing the unique identifier as foreign key. Each object type specific table has it's own primary key scope.
Both approaches would allow to provide a generic API like the FB API mentioned above. The second approach would allow to use object table specific primary keys internally and to expose the globally unique identifier only. However, if a global unique identifier might be u开发者_高级运维sed internally, the second approach would require a join as well.
Are there any experiences regarding pros/cons of a globally unique identifier and what are the best practices for implementing it?
Both of your proposed ways of implementing global identifier involve joins of big tables and effective doubling of the number of records in your database (each objects exists on its own but so does his parent/record with it's global ID).
I have a feeling it would be better to enforce global IDs in the application/data access layer. This can be done trivially by enforcing that IDs for each specific type of object come only from a subset of possible IDs. You could, for instance, reserve last/first x bits of all IDs to specify object type. Remaining part of IDs, would be the "actual ID".
If you're afraid of errors while assigning IDs for spefic table, you can add a check constraint that will enforce the ID is correct (e.g. ID < 4000 AND ID > 10000). If you are concerned for the bits/bytes wasted for the type of object in its identifier, you could expose the global ID only in your database access API, which would concatenate objects' ID (actually stored in a table) with their type IDs (derived from the object type).
"A problem well stated, is a problem already half solved".
Seems to me that you are mixing several concepts. You check other database apps., but it seems you got more confused instead of more informed.
You have several objects of different classes, and you want to know how to store them in a database. This is usually called by the "fancy name" of the Object Relational Mapping (O.R.M.).
Additionally, you want to use a Global Unique Identifier (G.U.I.D.) to identify an object both as Business / Programming Object and a row in a table.
Additionally, you also, want to use a G.U.I.D. to identify a class or object of certain type.
Let's say you are building an app. where you have several objects. There are several classes of the objects such as "Users", "Events", "Pages", and others. You can have several objects of the same class / type, but you need a way to identify one from another. To identify "John Doe" from Michigan, from "John Doe" form Queensland. Lets say that your objects are going to use a property of type G.U.I.D.
So let's suppouse you create a table for each class ("user" for "Users", table standard id. is singular and lowercase, altought you may ignore it, "event" for "events", and so on). Each table has several fields that represent the properties of the each object. So "user" will have a field like "user_key GUID", and maybe "user_name varchar(100)", and "user_birthdate datetime". The same goes for the other tables.
I have used "supertable" but for only a very specific, not common apps. I don't think you need a table that mixes "users", "events", "pages". I had a case where we had a supertable "customers", plus "company" and "person" subtables with specific additional fields. Sometime, we had to check sales for all customers, and make joins with the "customers" table. Sometimes, we had to offer a corporate discount for products, and browse the "company" subtable.
In case you want this Generalization / "IS a" supertable, you don't need to have a different field for the supertable primary key and detail table primary key, can be of the same type.
I suggest avoid at all cost use composite / compound keys ("master key" plus "other" fields), use a single field primary key. I also suggest that assign the G.U.I.D. key using programming, not in the database.
The G.U.I.D. uses more memory and disk space, than an integer key, but, its very fast and easy to get a key that is very difficult to be duplicated.
Again, you question is more of how to represent objects in a database, than the usage of G.U.I.D.
精彩评论