Better to use URL (long string) for primary key, or a shorter serial integer primary key?
Say I'm storing webpages in PostgreSQL. Is it more convenient to use the URL of the webpage as the primary key, or to create a more succinct SERIAL in开发者_JAVA技巧teger primary key? What is the recommended approach for a case like this?
For what purpose are you storing the webpages?
It seems like caching webpages would be dependent on the reason you are caching them. The first thing that comes to mind is that URL's can change. Do you want your record to change it's primary key as well? Or would a new URL be a new record?
With a few exceptions, it is almost always better to have it so that the PK doesn't have meaning outside of being a reference to that row in the database, i.e., a surrogate key. Then you can put a unique constraint on the URI field if you'd like. If nothing else, it keeps anything from referencing the table from having to also hold a copy of the string, and if you later need to obfuscate the string, pull it into another table for either analytics or restructuring purposes, or anything else along those lines the surrogate key will be of greater benefit than the natural key.
I also generally prefer to avoid situations where the key contains different parts in a single row that have meaning, which a plain text URI has in abundance since it can generally be broken into components.
Another thing you may want to consider is the performance hit you will take when using string field in JOINS, INDEXES and conditions. I must agree with @dclelements and @Neil in recommending an integer PK field.
Other considerations are the inability to auto increment URL PK values, so you will have to handle (the higher probability) of duplicate inserts to the table.
Integer for a PK is better database design.
精彩评论