Database Design: Is it bad to keep delimited strings in a database [duplicate]
I've worked on projects before that would store fields in a comma-delimited or p开发者_如何转开发ipe-delimited string as a field, which might represent options or something simliar. I was wondering if it was considered bad database design to do this, and relational tables should always be used, or sometimes is it acceptable to store data in this fashion?
Usually, that's a bad idea.
However, there is a balance between normalization and performance.
If the list isn't parsed and is simply displayed as-is, then it would be preferable to store the comma-separated list, but if the list is parsed for individual elements, you should stick to a normalized database schema.
This depends on you. A database field should contain atomic data, that is, the whole value is meaningful but part of the value is not. For example, I might decide to store a person's name in a field called fullname, so the values are John Smith and Mary Jane etc.
This is correct if for ever more I will always treat these values as one whole and never need to select first name only or last name only, or sort by last name etc.
If the last name or the first name is meaningful to me though, maybe I have to sort by last name in my queries, then I will create two fields firstname and lastName.
In your case, if the delimited items are not interesting at the database level then its fine to keep them in a single field. But if you will need to query by the delimited items then split them into their own fields.
Personally, I would approach that in two ways:
If it's feasible, I would simply turn that comma/pipe delimited list into a foreign key implementation and store that data in another table. This can have drawbacks if you're dealing with a ton of data but in most cases it can work and if you plan to query by these fields, you generally want to go this route.
Another approach is to simply store the data as a serialized object in your database rather than a delimited list. For example in Python you can use the
pickle
module to serialize standard objects which you can than store in the database without worry of code execution and other nasty hacks that can potentially happen.
If you find that increasingly you need to use the values stored in the string for searching in other tables then you should consider normalising your database as per other peoples suggestions. Drupal, Wordpress etc store basic information in strings and it works fine up to a point.
It depends - if the database doesn't need to do any sort of filtering or sorting on the data in the delimited strings and the data was originally sent to the database in a delimited format, why not? Why go to the trouble of parsing and splitting it into separate fields if the database is never going to use it?
It's generally considered bad because it defeats the purpose of having a database (and its powerful methods of indexing and searching data). However, it can be justified on a case by case basis. If there's already processing in place for the delimited strings, and you never want to do anything with the structure other than retrieve it... who's to stop you?
It isn't relational.
For example, if you have this:
abc | 123,456,789
def | 123
ghi | 123
Normalize it into something like this:
1 | abc | 123
2 | abc | 456
3 | def | 123
4 | ghi | 123
5 | abc | 789
Like all good design questions, the answer should be "it depends". A column in a relational db is meant to provide one discrete logical unit of information that can be used to characterize data. I would say that if there would never be any reason to characterize two different pieces of data by the content of the column - that is, you would never want to find some data but not others based on the content of the column - it's ok.
For example, if you were storing file permissions data in the column, and were only keeping the data to store the permissions of the file so that it can be read later, then it would be OK. If you ever wanted to query for those files which have u+x permissions, which is one component of the data in that column, you should separate the data to different columns.
It can be a good thing. I've worked with several systems that use database tables to log SOAP requests and responses. It would be an extremely ugly to try to normalize all of that data. There may be similar circumstances for comma-separated lists.
I would avoid putting delimited strings in a database field. A field should always hold only one value. This allows for more flexibility and allows you to use the built-in power of the database for a huge number of aggregate and other functions:
For example, suppose you had a database of Books and Authors (commonly used in database books)
- Getting a count of books by each author
- Adding or Removing a single book from an author
- Getting a total number of books by all authors
The possibilities are (almost) endless. But if a field is holding more than one value through the use of delimiters then that field needs to be parsed each time you need the answer to any of these basic functions.
It's important to remember that the database is there to serve the needs of the application, not the other way around. If your app needs to store a list of pipe-delimited data, so be it.
A good example of this is an admin tool that I've built. I needed version control capabilities for a large amount of relational data. I already had routines for serializing said data to and from JSON, so I created a new "versions" table which had some header information for each version and a text field which stored the JSON version. The editor also includes an "activate" button which deserializes and normalizes the data, and places it in separate tables so the primary application can access it to its heart's content. Note that this data is never modified by the primary app.
While I could have added a "version_id" field to every table in this schema, and added a corresponding parameter to all of my stored procedures, it would have been creating work for no good reason.
Just don't go overboard with this approach. It's technically possible to store your entire database in a single field, but that's hardly desirable or efficient.
As others have said, it's usually preferable to have the data normalised so that it can be accessed and updated easily, but there may be exceptional cases where there is no advantage in doing so.
What I would caution against is the "premature optimisation" mindset, where a designer assumes that holding the data in normalised tables will be bad for performance, without actually proving the point either way. I have often worked with databases where some information (like, a list of the roles that a user has) has been concatenated into a single string. I have almost always found that when, out of curiosity, I build the normalised version of the data and benchmark it, it performs at least as well as the "denormalised for performance" version.
If you're storing a snapshot of an item and would like to come back and see which options were selected at that time, I can see storing values like this into one single field. I've worked at a position where a single field would store a large XML file that repersented an invoice which was referenced as a historical document.
Using this type of method though for day to day usage would not be simple nor useful as stated from many of the other answers.
While this question is old I must say that I am having a pretty tough experience right now due to delimited strings that I have to use on the backend. I agree with most posts on this question in that it does depend on the needs of the application, but what also must be considered are the needs of the backend, meaning those who might actually have to use the data later on. As Vincent stated, if you plan on doing reporting, or somehow using the data that you're storing for some purpose later on, then generally you'll want to be sure you avoid making it harder on yourself or whomever may be using the data. If all you're worried about is storing it, then do whatever makes you happy.
精彩评论