What is a columnar database?

2022-12-17 06:00 问答作者：

I have been working with warehousing for a while now.

I am intrigued by Columnar Databases and the speed that they have to offer for data retrievals.

I have multi-part question:

How do Columnar Databases work?
How do they differ from relational databases?

How do columnar databases work? The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO  STATUS CITY    SNAME
---  ------ ----    -----
S1       20 London  Smith
S2       10 Paris   Jones
S3       30 Paris   Blake
S4       20 London  Clark
S5       30 Athens  Adams

would be stored on disk or in memory something like:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams

This is in contrast to a traditional rowstore which would store the data more like this:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

How do Columnar Databases work?
Columnar database is a concept rather a particular architecture/implementation. In other words, there isn't one particular description on how these databases work; indeed, several are build upon traditional, row-oriented, DBMS, simply storing the info in tables with one (or rather often two) columns (and adding the necessary layer to access the columnar data in an easy fashion).

How do they differ from relational databases? They generally differ from traditional (row-oriented) databases with regards to ...

performance...
storage requirements ...
ease of modification of the schema ...

...in specific use cases of DBMSes.
In particular they offer advantages in the areas mentioned when the typical use is to compute aggregate values on a limited number of columns, as opposed to try and retrieve all/most columns for a given entity.

Is there a trial version of a columnar database I can install to play around? (I am on Windows 7) Yes, there are commercial, free and also open-source implementation of columnar databases. See the list at the end of the Wikipedia article for starter.
Beware that several of these implementations were introduced to address a particular need (say very small footprint, highly compressible distribution of data, or spare matrix emulation etc.) rather than provide a general purpose column-oriented DBMS per-se.

Note: The remark about the "single purpose orientation" of several columnar DBMSes is not a critique of these implementations, but rather an additional indication that such an approach for DBMSes strays from the more "natural" (and certainly more broadly used) approach to storing record entities. As a result, this approach is used when the row-oriented approach isn't satisfactory, and therefore and tends to
a) be targeted for a particular purpose b) receive less resources/interest than work on "General Purpose", "Tried and Tested", tabular approach.

Tentatively, the Entity-Attribute-Value (EAV) data model, may be an alternative storage strategy which you may want to consider. Although distinct from the "pure" Columnar DB model, EAV shares several of the characteristics of Columnar DBs.

I would say the best candidate to understand about column oriented databases is to check HBase (Apache Hbase) . You an checkout the code and explore further to find out about the implementation .

Also, Columnar DBs have a built in affinity for data compression, and the loading process is unique. Here's an article I wrote in 2008 that explains a bit more.

You may also be interested in a new report from IDC's Carl Olofson on 3rd generation DBMS technology. It discusses columnar, et al. If you're not an IDC client you can get it free on our site. He's doing a webinar on June 16th, too (also on our site).

(BTW, one comment above lists asterdata but I don't think they are columnar.)

To understand what is column oriented database, it is better to contrast it with row oriented database.

Row oriented databases (e.g. MS SQL Server and SQLite) are designed to efficiently return data for an entire row. It does it by storing all the columns values of a row together. Row-oriented databases are well-suited for OLTP systems (e.g., retail sales, and financial transaction systems).

Column oriented databases are designed to efficiently return data for a limited number of columns. It does it by storing all of the values of a column together. Two widely used Column oriented databases are Apache Hbase and Google BigTable (used by Google for its Search, Analytics, Maps and Gmail). They are suitable for the big data projects. A column oriented database will excel at read operations on a limited number of columns, however write operation will be expensive compared to row oriented databases.

For more: https://en.wikipedia.org/wiki/Column-oriented_DBMS

Product information. This may help. These were to featured products on a Google search.

http://www.vertica.com/

http://www.paraccel.com/

http://www.asterdata.com/index.php

kx is another columnar database, for example used in the financial sector. The licence is somewhat $50K last time I checked, though. No optimisation needed, no index needed, because kx has powerful operators (matlab equivalents: .*, kron, bsxfun, ...).

继续阅读：database sql

What is a columnar database?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？