mysql: storing arbitrary data

2023-02-05 13:08 问答作者：

Background:

I was asking a question on stack overflow regarding creating tables on the fly where this conversation ensued:

This smells like a terrible idea! In fact, it smells just like this one.
What in the world do you want to use this for? – deceze
@deceze: very true, Howe开发者_运维知识库ver, How else would you store the contents of these CSV files.
They must be stored in mysql for indexing. The only solid fact about them is that they all have a mobile column with a standard format.
The CSV can have an arbitrary amount of columns with an arbitrary amount of rows.
They can (with no exaggeration) range from a single row, 35 column csv to an 80k row single column CSV. I am open to other ideas. – Hailwood
There are many solutions for this, from attribute-value schemas to JSON storage and NoSQL storage. Open a new question about it. Whatever you do though, don't dynamically create tables! – deceze

Question:

So my question is, What would you say is the best way to store this data?

Are you in agreement with deceze about not creating dynamic tables?

A very simple schema for storing arbitrarily long records is this:

Table `records`
===============
id
created
... other meta data ...

Table `record_attributes`
=====================
id
record_id
attribute
value

So a CSV record can be stored like this:

"Foo","Bar","Baz"     // column names
42,"Lorem","Jerry"    // first record
7,"Ipsum","Tom"       // second record
...

records(1, '2010-1-17', ...)
record_attributes(1, 1, 'Foo', 42)
record_attributes(2, 1, 'Bar', 'Lorem')
record_attributes(3, 1, 'Baz', 'Jerry')

records(2, '2010-1-17', ...)
record_attributes(4, 2, 'Foo', 7)
record_attributes(5, 2, 'Bar', 'Ipsum')
record_attributes(6, 2, 'Baz', 'Tom')

An alternative is to store the record data as JSON packed blob in a single column. If you don't need to search for the data, this is the most compact way, albeit not very RDBMS.

The best fit is probably a NoSQL database, if you have that option.

Could an XML file work with this kind of thing? Using XQuery makes it similar in language structure to SQL, and XML has no problems with dynamic adding of data.

You might also check out Amazon's SimpleDB. It's specifically engineered for this purpose. It allows you to add arbitrary attributes to your records and indexes across all of them. I'm sure there are probably other solutions in the NoSQL realm as well.

I wanted to elaborate on deceze's records/record-attributes answer, but a comment didn't suffice...

This is reminiscent of SimpleDB's Items and Attributes model. If you come from the normal RDB world, look at the SimpleDB documentation to see some of the strangeness you'll need to account for, such as:

All values are stored as text, so to sort (or select value ranges) for non-string data types (usually numbers and dates), you need to take some unusual steps, including zero-padding and offsets so that data sorts correctly "lexicographically".
Consider what your queries will look like. To get Items with Attributes like Color='Red' and Size>3, you might start with something like this:

SELECT Items.*, Sizes.Value AS Size FROM Items INNER JOIN Attributes AS Colors ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Color' INNER JOIN Attributes AS Sizes ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Size' WHERE Colors.Value='Red' AND Sizes.Value>'003'

You could structure this query in a couple of alternative ways, but the main things to note are:

the more Attributes you want to filter by, the more JOINs you need. Note that you CAN'T simply do: SELECT ... FROM Items INNER JOIN Attributes USING ItemID WHERE (Attributes.Name='Color' AND Attributes.Value='Red') AND (Attributes.Name='Size' AND Attributes.Value>'003') -- which is self-evident once you see it written out
if you want additional Attributes in the response, you'll need to add more joins (I included Size to show it's simple for one of the already JOINed Attributes). But what if you want to retrieve a response that has columns for some larger number of Attributes for the selected Items? The query will start to get more complex. At least SimpleDB handles this stuff for you transparently, so that the response to a query looks like what you'd expect, with columns for specified Attributes.

The point is that storing the data this way is fairly easy, but querying it becomes harder. And if your dataset gets big, you may need to give some thought as to the proper way to index Attributes.

继续阅读：csv database php

mysql: storing arbitrary data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？