开发者

Any scalable OLAP database (web app scale)?

I have an application that requires analytics for different level of aggregation, and that's the OLAP workload. I want to update my database pretty frequently as well.

e.g., here is what my update looks like (schema looks like: time, dest, source ip, browser -> visits)

(15:00-1-2-2010, www.stackoverflow.com, 128.19.1.1, safari) -->  105

(15:00-1-2-2010, www.stackoverflow.com, 128.19.2.1, firefox) --> 110

...

(15:00-1-5-2010, www.cnn.com, 128.19.5.1, firefox) --> 110

And then I want to ask what is the total visit to www.stackoverflow.com from a firefox browser last month.

I understand Vertica system can do this in a relatively cheap way (performance and scalability wise, but not cost-wise probably). I have two questions here.

1) Is there an open-source product that I can build upon to solve this problem? In particular, how well does a Mondrian system work? (scalability, and performance) 2) Is there an HBase or Hypertable base solution (obviously, a naked HBase/Hype开发者_StackOverflow中文版rtable can't do this) for this? -- but if there is a project based on HBase/Hypertable, scalability probably won't be an issue IMO)?

Thanks!


You can download a free edition (the single node edition) of the greenplum database. I haven't tried it myself but I think/guess it is a powerful beast. Read here: http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/

Another option is MongoDB, it is fast and free and you can write MapReduce functions with JavaScript to do analytics.

My reputation here is to low to add a hyperlink to mongodb, so you have to google . I can add only one hyper link per post.


The zohmg project aims to solve this problem using Hadoop and HBase.


Facebook also built Hive on-top of Hadoop. Pretty simple to get going - reasonable query API too.

http://mirror.facebook.net/facebook/hive/


Is your data model more complex than that? If it isn't you might be beter of just writing custom code for it. Then you can really tune it to your data. Real products have to offer a lot of flexibility, need a lot of complexiy to achieve that, and suffer in speed as a result.

Your question is not clear in one aspect: when you talk about scalable, what do you mean by that? Are you collecting data from lots of sites but only have a limited amount of query users, or do you also have a lot of users? That situation leads to a significantly different model.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜