mysql: storing arbitrary data
Background:
I was asking a question on stack overflow regarding creating tables on the fly where this conversation ensued:This smells like a terrible idea! In fact, it smells just like this one.
What in the world do you want to use this for? – deceze
@deceze: very true, Howe开发者_运维知识库ver, How else would you store the contents of these CSV files. They must be stored in mysql for indexing. The only solid fact about them is that they all have a mobile column with a standard format. The CSV can have an arbitrary amount of columns with an arbitrary amount of rows. They can (with no exaggeration) range from a single row, 35 column csv to an 80k row single column CSV. I am open to other ideas. – Hailwood
There are many solutions for this, from attribute-value schemas to JSON storage and NoSQL storage. Open a new question about it. Whatever you do though, don't dynamically create tables! – deceze
Question:
So my question is, What would you say is the best way to store this data? Are you in agreement with deceze about not creating dynamic tables?A very simple schema for storing arbitrarily long records is this:
Table `records`
===============
id
created
... other meta data ...
Table `record_attributes`
=====================
id
record_id
attribute
value
So a CSV record can be stored like this:
"Foo","Bar","Baz" // column names
42,"Lorem","Jerry" // first record
7,"Ipsum","Tom" // second record
...
records(1, '2010-1-17', ...)
record_attributes(1, 1, 'Foo', 42)
record_attributes(2, 1, 'Bar', 'Lorem')
record_attributes(3, 1, 'Baz', 'Jerry')
records(2, '2010-1-17', ...)
record_attributes(4, 2, 'Foo', 7)
record_attributes(5, 2, 'Bar', 'Ipsum')
record_attributes(6, 2, 'Baz', 'Tom')
An alternative is to store the record data as JSON packed blob in a single column. If you don't need to search for the data, this is the most compact way, albeit not very RDBMS.
The best fit is probably a NoSQL database, if you have that option.
Could an XML file work with this kind of thing? Using XQuery makes it similar in language structure to SQL, and XML has no problems with dynamic adding of data.
You might also check out Amazon's SimpleDB. It's specifically engineered for this purpose. It allows you to add arbitrary attributes to your records and indexes across all of them. I'm sure there are probably other solutions in the NoSQL realm as well.
I wanted to elaborate on deceze's records/record-attributes answer, but a comment didn't suffice...
This is reminiscent of SimpleDB's Items and Attributes model. If you come from the normal RDB world, look at the SimpleDB documentation to see some of the strangeness you'll need to account for, such as:
All values are stored as text, so to sort (or select value ranges) for non-string data types (usually numbers and dates), you need to take some unusual steps, including zero-padding and offsets so that data sorts correctly "lexicographically".
Consider what your queries will look like. To get Items with Attributes like Color='Red' and Size>3, you might start with something like this:
SELECT Items.*, Sizes.Value AS Size FROM Items INNER JOIN Attributes AS Colors ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Color' INNER JOIN Attributes AS Sizes ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Size' WHERE Colors.Value='Red' AND Sizes.Value>'003'
You could structure this query in a couple of alternative ways, but the main things to note are:
the more Attributes you want to filter by, the more JOINs you need. Note that you CAN'T simply do: SELECT ... FROM Items INNER JOIN Attributes USING ItemID WHERE (Attributes.Name='Color' AND Attributes.Value='Red') AND (Attributes.Name='Size' AND Attributes.Value>'003') -- which is self-evident once you see it written out
if you want additional Attributes in the response, you'll need to add more joins (I included Size to show it's simple for one of the already JOINed Attributes). But what if you want to retrieve a response that has columns for some larger number of Attributes for the selected Items? The query will start to get more complex. At least SimpleDB handles this stuff for you transparently, so that the response to a query looks like what you'd expect, with columns for specified Attributes.
The point is that storing the data this way is fairly easy, but querying it becomes harder. And if your dataset gets big, you may need to give some thought as to the proper way to index Attributes.
精彩评论