Netezza, Teradata, DB2 Parallel/Enterprise, ... versus Hadoop or others?

2022-12-17 05:50 问答作者：

I'm looking at building some data w开发者_开发问答arehousing/querying infrastructure, right now on top of Map/Reduce solutions like Hadoop.

However, it strikes me that all the M/R work is just repeating what the RDBMS guys have solved for the last 20 years with parallel SQL databases. Parallel SQL implementations scale reads and writes across nodes, just like M/R, but additionally already contains the niceties from regular databases (SQL, existing integration libraries, etc).

The problem is: you don't seem to find the customers of those companies posting much online. So, does anyone here have experience with those kinds of solutions, and can give me some insight and/or links?

I have used Netezza and Hadoop. And have second hand knowledge of Infobright, a column database.

Netezza is a true database and implements ACID properties, which has both a cost and a benefit. Netezza is moving toward allowing more M/R code to run on its table data with the new architecture of twinfin. In the previous version of the appliance they supported user-defined functions and aggregations. In the new version, which runs linux on the SPUs and uses Intel processors, the door is opening to do more custom code close to the data. My experience with Netezza has been very positive - both the technology and the company.

Hadoop is pure map-reduce computing. It doesn't incur the cost of ACID database properties. So, it's really a different beast than Netezza. Depending on the use pattern it may be better and certainly cheaper than Netezza. Hadoop had supports Hbase and Hive that may give you the query convenience you need at a lower cost.

Another developer on our team evaluated Infobright, so this is second hand, and found the load performance to be poor and some of the aggregations to be slow. It has some parallels with Netezza (e.g. zone maps are used in netezza to help narrow scan scope). Infobright is open source with both a community and a supported enterprise edition.

There is much more that can be said in context of your particular problem - probably beyond the scope of this forum. Hope this helps.

You haven't specified what questions you are trying to answer with your queries, or how your data is structured. Before you choose what solution to use you probably need to think about those two things.

You're correct: the major RDBMS vendors offer clustering solutions; both for parallel processing and high availability. They've had this technology for a while and any enterprise with a lot of data is probably using it. When you buy ($$$) the product they will give you lots of documentation and help you set it up (more $$$) if you can afford it.

RDBMS are good for online transactions (OLTP); answering questions about specific rows (where does Mary live?); answering some summary-type questions (how much did we sell in the first quarter, etc.) Although they can be made to perform detailed summary questions (how much did we sell in the first quarter, broken down by product, salesperson, month, and region?), you're usually starting to tax their limits (any query that needs to visit all of the rows is going to be slow).

For those types of queries most enterprises have a data warehouse that structures the data into multi-dimensional "cubes." (See Cognos, Hyperion, others). That may be appropriate for what you're trying to do.

I don't have any experience with MapReduce but I've read the wikipedia section on Uses and so if what you're trying to do falls into those categories I'd continue with it.

If you are in a fast paced growing organization, you should use Teradata. We really have a good experience with Teradata. It gives you the scalability which cannot be given by any other vendor. Once you get used to its SQL and working style you will really appreciate the design and architecture of Teradata.

继续阅读：data-warehouse db2 mapreduce netezza teradata

Netezza, Teradata, DB2 Parallel/Enterprise, ... versus Hadoop or others?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？