Should I use a code in my lookup table
I am woking on an Orable database and I am adding a couple of lookup tables.
The general question is should the lookup table contain a code and a description and the code be the FK back to the main table, or should the lookup table contain only a description 开发者_运维问答and that be the FK back to the main table.
I am arguing for the code/description pair. I feel that if I have type = Contractor
and code = CN
a stored proc should say where type='CN'
rather than only having the type=Contractor
and no code and saying this in the stored proc: where type='Contractor'
Because what if I want to display: General Contractor
to the user rather than Contractor
. I would then have to change the stored proc. I feel that I should not have to do this. (changing a stored proc requires recompilation in dev, migration to test, retesting by the clients, and migrating a prod which requires going through a change control proccess which involves a two week waiting period; whereas modifying a record in a table does not require any of this)
My data modeler wants to use only the description. His main argument is that it would require an unnessesary join.
Which way should it be done? And if if should be done the code/description way how do i convince the data modeler?
Thanks!
type_cd type_dsc
CN Contractor
IN Inspector
Summarizing all of the answers, I think there are four alternatives for the lookup table:
Alternative 1:
• Description (primary key, a longer varchar2 column)
Alternative 2:
• Code (primary key, a short varchar2 column)
• Description (not null, a longer varchar2 column)
Alternative 3:
• Id (a meaningless primary key, an integer value derived from a sequence)
• Description (not null, a longer varchar2 column)
Alternative 4:
• Id (a meaningless primary key, an integer value derived from a sequence)
• Code (unique key, a short varchar2 column)
• Description (not null, a longer varchar2 column)
The primary key column will be in the main table with a foreign key constraint on top.
Some characteristics per alternative:
Alternative 1:
• No join is required when querying the main table
• Clear meaning when doing ad-hoc queries on main table
• Requires more storage for main table
• Index on main table will be much bigger than in other alternatives
• Updating a Description value means maintenance trouble and possibly application downtime.
Alternative 2:
• Join is required when you want to retrieve the description value
• Join in not required if you want to filter on certain lookup values: you can use the code value for that.
• Pretty clear meaning when doing ad-hoc queries on main table
• Minimal additional storage requirements for main table
• Index on main table will be small.
• Updating a Description value is easy, however code is usually an abbreviation from the description. When updating a Description value, the code can become confusing.
Alternative 3:
• Join is required when you want to retrieve the description value
• When filtering on certain lookup values, you'd have to use the Description values in your queries as the Id's are meaningless.
• Meaning is not clear when doing ad-hoc queries on main table
• Minimal additional storage requirements for main table
• Index on main table will be small.
• Updating a Description value is easy and doesn't cause confusion as with Code values
Alternative 4:
• Join is required when you want to retrieve the description value
• Join is required when filtering on certain lookup values, you would use the Code value in the lookup table.
• Meaning is not clear when doing ad-hoc queries on main table
• Minimal additional storage requirements for main table
• Index on main table will be small
• Updating a Description value is easy and you can also very easily update the Code value as well to make it resemble the Description value. You may have to revisit some of your code when doing this, though.
Personal opinion:
I would look at how I plan to use the main table and the lookup table. Which queries will be important and have to run efficiently? Will the values ever change?
My personal choice would be alternative 2 or 4. I'd use alternative 2 if I was absolutely sure that the code value can never change. And this is rare. Country codes change, social security numbers change. Currency codes change, etcetera. So, most of the time, I'd choose alternative 4. I'd not be so concerned for an extra join, especially because the lookup table is a small table.
But: choose an alternative that suits your requirements.
Please feel free to edit the text when you know some more characteristics of an alternative.
Regards,
Rob.
Code/Description. That code value is (I assume) going to be a smaller, more efficient integer. Furthermore, you don't want to be in a situation where you need to update all your foreign keys just because a text description changes at some time in the future.
EDIT: Based on the sample code you just added, I would encourage you to make your code value an integer value rather than a string like 'CN', 'IN'. You want your key value to be agnostic to any "meaning" associated to the description. 'CN' still implies 'Contractor' and if/when that description changes to be 'External Resource' then 'CN' is going to be misleading.
Well, this would depend on how "standard" those codes are.
Consider a lookup table like this one:
Code Description
------------------
USD United States Dollar
GBP Pound Sterling
AUD Australian Dollar
EUR Euro
For this, I would use char(3)
for the Code
and make it the primary key. Your codes seem to be char(2) -- neat, small; smaller than an integer.
So my guess would be to use the Code
as the PK in the lookup table and the "main table" would have Code
as a FK to the lookup table.
And if your codes are not very standard and subject to change, integer is preferred.
There are existing standards and encoding schemes for a lot of things, that were thought out be people who are smarter than me and had a lot more time to think about it. For example, the ISO standard covers Sex Codes (iso 5218), country codes (iso 3166), language codes (iso 639), currency codes (iso 4217) etcetera. I bought Joe Celko's Data, Measurements and Standards in SQL last year, and it really surprised me just how many officially maintained ready-made standards and encoding schemes there are out there.
Ok fine, so once in a while some country abandon their bananas in favour of EUR/USD, and now you have to rewrite your entire application? No, you'd have to spend a few hours writing a script to merge/split whatever codes was changed. Big deal. Why don't you fix a few bugs during that same release while your're at it?
Personally, I use short character codes for almost everything I have to write code against, or when I need to assign behaviour depending to some "type code". The code is so intimately tied to the typecode, so why make it harder than it has to? The resulting code is easier to read, and it executes faster because I need less joins. For everything else (basically all user-generated), I use integer surrogates.
I have "only" worked with databases for 11 years, but I haven't seen many cases when the "name changed" so significantly that the code became misleading. A type code of "contractor", can't change into "Human resource manager" or "Vice president". That is a new code. It might split into "internal/external resource", but that would require code changes as well, in which case I don't see a problem with adding a few hours of data conversion to the project budget?
Finally, at some point you have to commit to a value that you put in your code. You can use whatever value you want, but it still means the same thing.
I've seen all of the following:
where type = 1 /* contractor */
vs
int type_code = configfile.lookup("sqlcodes.contractor");
...
where type = :type_code
vs
from sometable
join contract_types using(type_id)
where contract_types.type_name = 'Contractor';
...but I still don't see the benefit over just:
where type = 'CN'
The point I'm trying to make is: When we spend 80 hours on development, how the hell can 4 hours of database activities NOT fit into the project budget?
use a numeric id value and a description. Store the id in the main table as the FK.
strings are lousy FK values, and basic normalization will tell your data modeler that you want the flexibility of changing the string once in the lookup table and not have to change it wherever it is referenced.
Use a mnemonic (e.g. CN for Contractor, etc.) that is never changed. Let the UI show the description. A small code table will probably fit in one or two blocks and will therefore usually be found in the cache so lookups will be cheap.
Most importantly, future developers and people (like me) who come later and have to map this data to other systems will thank you, because it means that 80% of the time you can just query the table and understand it intuitively.
When I see a table like this, I just want to scream:
ADDRESS_ID
HOUSE_NUMBER
STREET_NAME
STREET_TYPE_ID
LOCALITY_ID
SELECT * FROM addresses WHERE street_type_id = 10053;
Codes never have to change (they're internal to the system, the end users shouldn't see them). Descriptions sometimes change, but usually not to a great degree. The occasions when a description changes so much that the mnemonic no longer makes sense, in my experience, are very rare.
look, depending on key distribution - for most small lookup tables any queries joining to it are going to do a full table scan against the lookup and hash-join the key anyway - so numeric versus character is probably a complete non-issue for performance of the join.
The question really is - do you need the join?
I mean - if you are storing the lookup to be used as the display value by the UI, then yes use a lookup and join to it so you can update the display value easily.
If, on the other hand, you are using resource files for a multilingual app and you are using the returned code as the lookup, AND it is a small set of codes that is not expected to change (Gender_Code = 'M'ale or 'F'emale or 'U'nknown for example) - then make the codes meaningful, use a check constraint on the field to control the values, and don't even bother with a lookup table because you know them by the code and the UI will figure out how to display them.
My suggestion is to use an int ID and a char/varchar description.
Use the ID in your queries and only link to the description when you need to display the description.
Do not worry that the ID will not look like the description. This is the way it is supposed to work. You want a non-significant ID so that no one will guess the meaning of 'CH' or 'EX', etc. Add comments to the code explaining what the ID is supposed to mean.
You want to be able to change the description at any time without breaking your code. You do NOT want to have to fix all the codes when they change.
Also, you can add one or more groups to the descriptions table. If you have several contractor types you can add a group column that indicates the type. Then you can link to the group description table and return all rows where the group is Contractor. Of course this group should have a lookup table with an ID and a description so you can change the displayed names of the lookup groups.
Tell the data modeler that you want to put the data in related tables. That is why it is called a relational database.
精彩评论