开发者

Streaming text logfiles into RabbitMQ, then reconstructing at other end?

Requirements

We have several servers (20-50) - Solaris 10 and Linux (SLES) - running a mix of different applications, each generating a bunch of log events into textfiles. We need to capture these to a separate monitoring box, where we can do analysis/reporting/alerts.

Current Approach

Currently, we use SSH with a remote "tail -f" to stream the logfiles from the servers onto the monitoring box. However, this is somewhat brittle.

New Approach

I'd like to replace this with RabbitMQ. The servers would publish their log events into this, and each monitoring script/app could then subscribe to the appropriate queue.

Ideally, we'd like the applications themselves to dump events directly into the RabbitMQ queue.

However, assuming that's not an option in the short term (we may not have source for all the apps), we need a way to basically "tail -f" the logfiles from disk. I'm most comfortable in Python, so I was looking at a Pythonic way of doing that - the consensus seems to be to just use a loop with readline() and sleep() to emulate "tail -f".

Questions

  1. Is there an easier way of "tail -f" a whole bunch of textfiles directly onto a RabbitMQ stream? Something inbuilt, or an extension we could leverage on? Any other tips/advice here?

  2. If we do write a Python wrapper to capture all the logfiles and publish them - I'd ideally like a single Python script to concurrently handle all the logfiles, rather than manually spinning up a separate instance for each logfile. How should we tackle this? Are there considerations in terms of performance, CPU usage, throughput, concurrency etc.?

  3. We need to subscribe to the queues, and then possibly dump the events back to disk and reconstruct the original logfiles. Any tips/advice on this? And we'd also like a single Python script we could startup to handle reconstructing all of the logfiles - rather than 50 separate instances of th开发者_开发知识库e same script - is that easily achievable?

Cheers, Victor

PS: We did have a look at Facebook's Scribe, as well as Flume, and both seem a little heavyweight for our needs.


You seem to be describing centralized syslog with rabbitmq as the transport.

If you could live with syslog, take a look at syslog-ng. Otherwise, you might save some time by using parts of logstash ( http://logstash.net/ ).


If it would be possible you can make the Application publish the events Asynchronously to RabbitMQ instead of writing it to log files. I have done this currently in Java.

But some times it is not possible to make the app log the way you want.

1 ) You can write a file tailer in python which publishes to AMQP. I don't know of anything which plugs in a File as the input to RabbitMQ. Have a look at http://code.activestate.com/recipes/436477-filetailpy/ and http://www.perlmonks.org/?node_id=735039 for tailing files. 2) You can create a Python Daemon which can tail all the given files either as processes or in a round robin fashion.

3) A similar approach to 2 can help you solve this. You can probably have a single queue for each log file.


If you are talking about application logging (as opposed to e.g. access logs such as Apache webserver logs), you can use a handler for stdlib logging which writes to AMQP middleware.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜