开发者

Why don't RDBMS support array types for columns?

Lets take the beaten to death example of a blog engine.

You have the blog, the blog has posts, the posts have tags for organizational purposes. After deciding that tagging problem is not trivial in an RDBMS environment, we go to google for guidance and find the following neat summary of the solutions as the first hit: designs and related benchmarks. However, all of them come at a cost of either performance or complexity. Seems like a NoSQL-like approach of letting you store a list of tags within a column (in NoSQL we can store docume开发者_开发知识库nts in documents) would solve the problem nicely. Why don't SQLServer/Qracle/MySQL/Postgres/etc. have it then?

At first I thought it might be because of the varying size. But any RDBMS worth noting allows some form of varchar and text (substantial in size). So sizing of the column (and the fact that the same column in different rows would have different size isn't the issue). So instead of storing a blob of text, let us store a list of items of the same type (an array in most languages) in a column. Let us index it for efficient exact searches matches. And at least for all use cases that I have the need for NoSQL DBs would disappear as a necessity (I know a lot of people are harping about scalability, but I don't know/care enough about that, I don't have scalability issue, I have maintenance nightmares). We get simplistic design of our schema (every bit as clean and simple as document in document of NoSQL) and great performance thanks to efficient indexing. Stranger still, that an open source DBs (e.g. Postgres) don't have some sort of patch for this feature. Developers with motivation in the fields seem way to enamored with creating new DBs from scratch, these days.

Am I missing some staggering technical obstacle or are the aforementioned RDBMS vendors just lazy or leaving in the past?


The reasons are historical.

Allowing values of any "collection" type inside a cell in a table, was typically considered a violation of 1NF, as it implied, "by definition", the possibility of "repeating groups" appearing inside a (single row of a) table.

Theory has evolved quite a bit since the early days of SQL, however, and theory currently has it that :

(a) Any type of value, including Array/Set/Collection types, should be allowed in a cell (b) To be in 1NF, simply means to be relational data. (But note that SQL tables typically are NOT "relational" data in the sense how theory defines the concept.)

Date has written dozens of pages on the subject. Reading your way through those will provide you with an answer to your question that is far more complete than any answer that can fit in here.


Why don't SQLServer/Qracle/MySQL/Postgres/etc.

They don't?


Short answer: Because that would be non-relational. Most NoSQL databases eschew the sort of relationships that make up a traditional Relational Database.

This "task" ("storing an array") can be accomplished in many ways -- XML, JSON, a custom format, or even custom database types, etc. The amount of support (including native type support as mabn pointed out) varies by the RDBMS. For instance, SQL Server provide a fair bit of XML support. However, this generally breaks database normalization (if it is cared about) -- in the case of NoSQL the baseline is often that it is not.

The benchmark also only really considers the intersection of relatively many tags in a query and it does not show any NoSQL solutions to this problem -- e.g. how would a NoSQL solutions find the results of a query for the intersection n-tags stored in an array?

That is, imagine that an array type is used. How long would those same queries take to execute? Without extensive uses of indices and hash-joins, I'd imagine "a very long time".

Happy thinking.


Postgres supports arrays.

It is true that array types are supposed to be a big no no as far as database normalisation forms is concerned but sometimes you have to live dangerously :-)

https://www.postgresql.org/docs/current/arrays.html


You can store a table inside the intersection of a column and a row. You can do everything you would be able to do with arrays in columns, and more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜