What is the best components stack for building distributed log aggregator (like Splunk)?

2023-01-04 14:20 问答作者：

I'm trying to find the best components I could use to build something similar to Splunk in order to aggregate logs from a big number of servers in computing grid. Also it should be distributed because I have gigs of logs everyday and no single machine will be able to store logs.

I'm particularly interested in something that will work with Ruby and will work on Windows and latest Solaris (yeah, I got a zoo).

I see architecture as:

Log crawler (Ruby script).
Distributed log storage.
Distributed search engine.
Lightweight front end.

Log crawler and distributed search engine are out of questions - logs will be parsed by Ruby script and ElasticSearch will be used to index log messages. Front end is also very easy to choose - Sinatra.

My main problem is distributed log storage. I looked at MongoDB, CouchDB, HDFS, Cassandra and HBase.

MongoDB was rejected because it doesn't work on S开发者_如何学Colaris.
CouchDB doesn't support sharding (smartproxy is required to make it work but this is something I don't want to even try).
Cassandra works great but it's just a disk space hog and it requires running autobalance everyday to spread the load between Cassandra nodes.
HDFS looked promising but FileSystem API is Java only and JRuby was a pain.
HBase looked like a best solution around but deploying it and monitoring is just a disaster - in order to start HBase I need to start HDFS first, check that it started without problems, then start HBase and check it also, and then start REST service and also check it.

So I'm stuck. Something tells me HDFS or HBase are the best thing to use as a log storage, but HDFS only works smoothly with Java and HBase is just a deploying/monitoring nightmare.

Can anyone share its thoughts or experience building similar systems using components I described above or with something completely different?

I'd recommend using Flume to aggregate your data into HBase. You could also use the Elastic Search Sink for Flume to keep a search index up to date in real time.

For more, see my answer to a similar question on Quora.

With regards to Java and HDFS - using a tool like BeanShell, you can interact with the HDFS store via Javascript.

继续阅读：hbase hdfs logging ruby splunk

What is the best components stack for building distributed log aggregator (like Splunk)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？