Is converting database string enums to integers worth it?
There are two ways to store enum types in database: as a string or as an integer.
Saving the enumeration ( sex = {male,female}
, account_type = {regular,pro,admin}
, etc. ) as strings makes things more readable but requires more space than integers.
On the other hand, integers require mapping the enums in and out of the database. As a benefit, case-sensitivity is handled outside of the database with integers.
Assuming both are indexed, is doing the integer conversion generally worth it? How much faster is the lookup with integers?
Example
Perhaps a concrete example could help to visualize things. Lets take the above account_type with a database of 100,000 users.
String enum
Assumin开发者_如何学Cg 8-bit fixed length CHAR type
7*100000*8/8 = 700000 bytes
Integer enum
Assuming 8-bit TINYINT integers
100000*8/8 = 400000 bytes
Seems like the size is almost half with integer enums. Also need to concider the indexes.
The answer is, as you would expect, it depends.
The larger the database the more significant the space savings - not only on disk but also in network IO and computation.
Personally, I would store integers instead of textual values, unless there is direct DB supprt for enumerations (as MySQL does).
The ints will take less memory if size of database becomes an issue.
It depends if you are returning values from the database directly without going through your code layer (e.g. some form of translation). If you are then you would need the string values in the database (however you could store them as lookups in a related table)
There is always the question of whether the DB will be looked at by humans, as opposed to through applications that do the conversion. If a person is looking at the DB for any reason, text is better - this is especially the case if there are DBAs who may not have access to the code to see the enum conversion.
If the size of stored data is more important, then converting to ints is a better idea. But for this improved space, you lose readability. It depends on what is the most important factor.
Of course, you could include SProcs or Views or suchlike to look at the stored integer data and convert it to string value, which would make sense if you need to balance between the two.
But as Oded said - there is no one simple answer. Every situation will be slightly different.
Actually, what you probably want to do is create a mapping table in your database, regardless.
This takes care of a number of things -
1) You assign an Id column as usual, then assign foreign keys to the appropriate columns. This prevents nonsense values from being inserted. This also deals with normalization issues.
2) With the mapping table, you can then use views to construct database-only selections, which simply swap out the id value for the requisite text string.
3) With a mapping table, it also becomes easier to deal with internationalization issues (note: this does not necessarily mean simpler, exactly). Here is how I would set up the tables for this:
Gender_Mapping
Id | Enum_Mapped_Value | DBA_Readable_Description
Gender_Description
Id | Gender_Mapping_Id | Language_Id | Language_Specific_Description
For retrieval issues, (Enum_Mapped_Value)
and (Gender_Mapping_Id, Language_Id)
should be unique (or returned unique from a view, at least).
Enum_Mapped_Value
should be some character code (maybe 5 characters?) that is used to map the enum to the database. Do not use either the ordinal value, or the name of the enum itself - use a constructor-assigned internal value; Otherwise, future developers may reorder the enums, or rename them - but interal values are far more likely to be left alone.
Language_Id
should map as a foreign key to a Language_Mapping
table of some sort, if you ever plan on dealing with more than one language.
精彩评论