开发者

Difference between a row-oriented and column-oriented databases in dealing information retrieval

Recently, I started working on HBase (one of the column-oriented databases). While going through the source code, one question keeps popping in my head. Thought of asking this. My question is, how exactly a row-oriented database deals w开发者_如何学Goith information retrieval (say a select query) and how different is that when comes to column-oriented database. And how different these databases store data in the underlying flat files ( at the end of day, every database uses files).

Please do correct me in case I went wrong in any part of this question.

Regards, Krishna


If I understand you correctly, you are interested more in the underlying storage and retreival issues, and less in the DDL and definition issues, the categories of column-oriented dbs, right ?

I will assume you understand that virtually all storage, regardless of vendor, is some form of:

  • a B-Tree for Indices
  • a Heap for un-organised data

On top of that foundation, each vendor has optimisations, and patented specialisations. Eg. Sybase (row) has:

  • Clustered Index, which marries the data rows to the B-Tree and eliminates the Heap.

The next issue is that all the vendors (except oracle) have reasonably sophisticated engines, with modular design, and the I/O is handled asynchronously, at a low level, to obtain speed. The Unit of I/O is a Page. Usually 2 to 8KB for OLTP systems and 8 to 64KB for DSS. (notice I an avoiding the Row vs Column issue.) So regardless of row/column the DSS engines are build for mass retrieval, due to getting more Index/Data rows or columns in large blocks, with less I/O requests.

"Large I/O" can be performed by reading Extents (8 Pages) and larger AllocationUnits (256 Pages) into memory with one I/O request. But the basic unit is the Page.

Row vs Column

  • Row
    • Each row is a contiguous unit on the page, and a number of rows are packed into the Pages.
    • For Indices, that doesn't really matter, because the entire data structure is the compounded columns in the key; the index entry or record is a small index entry+pointer, and much more index entries are packed into the same pages.
    • They are very fast for small numbers of rows; slow at summarising column aggregates
      .
  • Column
    • Each column is a contiguous unit on the Page. And since the column may be millions of entries (rows) long, they run for many, many Pages.
    • Indices are the same as Row above. With the addition of a specialised form of index that is supposed to be faster for columnar navigation.
    • They are phenomenal for columnar aggregates; very slow at constructing rows from the column-based data

All queries executed against the engine, have to navigate indices, retrieve data rows/columns from the above data storage structures.

The result is a multiplication of the above;

  • small/large blocksize, times

  • underlying physical structures, times

  • Row/Column orientation

Is that what you were looking for ? There is a set of technical (not warm and fuzzy) diagrams of the above for Sybase ASE, a OLTP/DSS strictly row-oriented engine, which I can get my hands on, if you are interested.

Responses to Comments

.
you mean to say that eventually we will boil down to page irrespective of database type.

Yes.

If this is the case, then how clustering of a database will be done. Lets take a database that stores data in row mannered. If I am doing clustering for this type of databases, how exactly will the table structured be carried to different nodes ( if I have more than one node). Will this table structure linked to a page or will it be through a different mechanism.

You know, before I answer the question, I have to acknowledge you. For someone with your level of knowledge, it is excellent that you have penetrated down to that crucial point, obtained that insight. Shiva ki Jai!

Yes, that is the critical design problem of a clustered DBMS, the crucial limiting issue, above all the various design issues related to clustering; that if the vendor handles this issue well, the cluster works well; and if not, the cluster is a dogs breakfast.

Everything in IT is governed by the laws of physics. Nothing is free, every function of feature has a cost, processing or storage. There is no magic, except maybe in MS marketing brochures.

Good Clustered DB Architecture

I do not know all clustered DBMS; I know Sybase CE and Oracle RAC really well. Working knowledge of Sybase IQ.

  • Oracle RAC has been around much longer, and is more mature. It handles this critical issue quite badly. So it ends up contending with itself and needs way more CPU power (cores, CPUs, not nodes) than the original estimate. The more nodes, the more contention.
    .
    It should be noted that Oracle non-RAC architecture is crap, or more precisely, non-existent; so the RAC has a sandy foundation to build on.
    .
    Not to mention, the stability sucks dead bears.
    .
  • Sybase CE is only one year old. But the architecture is brilliant, it handles this critical issue really well. There is only one version of the page on a SAN. All nodes are connected to the SAN. Any node can read or write the Page. The nodes are connected by a private LAN (in addition to the normal client-server LAN used by everything else on the network). The nodes co-ordinate the locks plus a bit of inter-node communication for laod-balancing, etc.
    .
    At the end of the day, for maximum concurrency, even with Sybase CE, you need to logically partition the databases, so that the workload on each node is separated, is accessing diffrent filepaths, or separate physical areas of the shared db.

  • Sybase IQ is already 100% column oriented. It is their DW offering. It already does full load balancing. It can be used is a cluster, but not clustering in the CE sense described above. I should have included it in

Poor Clustered DB Architecture

The dogs breakfast type of clustered dbms do stupid things. To list a few:

  • have the pages stored on every node[massive duplication], but then have to move the updated pages around the cluster

  • use MVCC to overcome the problem (but MVCC is far more overhead and actually slows the concurrency down, so it is fighting itself)

Cluster Unsuitable for Dedicated DB Server

Basically clusters are great for some applications, but it is a stupid idea for dedicated db servers (one fact in one place; shared resources which are administered together; lock contention which is most efficient when managed in one place because the data is in one place). I would never recommend a cluster for a db server.

  • Same as the SAN problem. Sure, many people have their db storage sitting inside a SAN, but for highest speed, and isolation from the load issues of other servers connected to the SAN, nothing comes close to local disk.

  • Same as the VMWare problem. Sure, many people have the db server established as a VMWare host unit, but for the highest speed, remove the overhead of VMWare; for isolation from the load issues of other host units in the chassis, get it out of there, onto a dedicated hard box.

Why Do DB Vendors Bother with a Cluster

  1. Oh there is value in it, but not now, in the future. AFAIC, the Sybase architecture will predominate over time, and all others will fall by the wayside. Every vendor will copy it as usual.

    The real power of Sybase CE is:

    • true 100% uptime (being able to add a node to the cluster and take the old node down for maintenance) and

    • fully dynamic load balancing (say the existing node is 4 x quad core; add a temp 4 x quad core node; take the old node down; insert 2 x quad core; bring it up; take down the temp node down) and then within 60 seconds, with no fingers on any keyboard, the whole beast rebalances.

    A shop that can stagger the nightly db maintenance schedule of their several single-node servers, can save a fair amount of money; they just have a a couple of additional machines for switching in/out.

  2. Data Warehouses are a little different. They are mostly read-only. So it is no problem to host it on a cluster (many reader nodes, only one writer node, no contention, no one cares that the pages are being written as they are being read). Sybase IQ is such a product.

Sybase CE for Column Oriented

  1. Sybase IQ is already column oriented and can be deployed in a cluster, but not clustering in the CE sense described above. Columns are mapped to pages. I should have included it in Good Clustered Db Architecture above, corrected now.

  2. I am not aware of hybrids combining column- and row-oriented that are worthwhile.

  3. But the full answer to that question is, use a pure Db (not DW) such as Sybase ASE or ASE/CE, and implement a true Sixth Normal Form database. That is the ultimate Normalisation, the irreducible NF, with several substantial advantages, including speed and ease of pivoting. It provides column-oriented storage on the pages. Due to SQL not supporting 6NF fully, you will need to provide Views to supply 5NF rows from the (stored) 6NF structures. I wrote an extension to the catalogue, so that I could generate the SQL code for developers to use.


One problem with your question is that the long-standing database term "column oriented" has been appropriated (some might say "hijacked"!) by the NOSQL community to describe something completely different from what it originally meant. Both meanings of "column oriented" are still current but they refer to very different DBMS products. So it is often helplful to clarify which you are talking about. In this case, it's the NOSQL meaning of the term.

In the original meaning of a column oriented database the answer to your question is that there is no difference in the way you retrieve information. Column store isn't a different data model, it is simply a different type of representation in internal storage.

In the NOSQL community however, the term column store refers to a different kind of data model.

Good explanations here:

http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html


Row oriented databases, a.k.a. "traditional RDBMS" (such as MySQL, Oracle, DB2) use transactional secondary index updates, use B-Tree -like structures for secondary indices in most cases

Column oriented databases, a.k.a. "NoSQL" (such as Google Big Table, HBase, Cassandra) use simplified structures for primary key indices (which are not B-Tree)

Column oriented databases do not support "traditional" transactional secondary indices. It is responsibility of a user to maintain "inverted index".

Cassandra supports B-Tree -like index for a row: each cell in a row has a title, and cells are physically sorted by title.

Another (possibly super important) difference: for zillions records in Oracle, you will need to maintain B-Tree for primary key, and it's size will be also zillions-like; performance of "find by primary key" is not good.

On the other hand, you can have "wide rows" in Cassandra or HBase, and unite similar "cells" into single wide row; size of "primary key index" becomes millions times smaller, and "find by primary key" is super fast (and it is not B-Tree; it is clustered search)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜