开发者

Search in big text log files

let's say you have an game server which creating text log files of gamers actions, and from time to time you need to lookup something in those logs files (like investigating an scam or loosing an item). Just for example you have 100 files and each file have size between 20MB and 50MB - How you would search them quickly?

What I have already tried to do is create several threads and each invidual thread will map his own file to memory (let say memory should not be problem if it not exceed 500MB of ram) per开发者_JAVA百科form search here, result was something around 1 second per file :

File:a26.log - read in: 0.891, lines: 625282, matches: 78848

Is there better way how to do that ? - because it seems to me kinda slow. thanks.

(java was used for this case)


Tim Bray was investigating approaches to process Apache log files here: http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

Seems like there may be a lot in common with your situation.


You can use Unix commands combinations with find and grep.


For ad-hoc searching of large text files, I would use the UNIX grep, fgrep or egrep utilities. They have been around a long time, and have had the benefit of many people working on them to make them fast.

On the other hand, the ultimate bottleneck in search text files (that haven't been previously indexed) will be the speed at which the application + operating system can move data from a disc file into memory. You seem to be managing 20Mbytes or more per second, which seems reasonably fast ... too me.


I should probably mention that in first post, game server is written for Win64x - and I'm wonder if it is on same performace level like grep for Windows and for unix?


Of course there is a better way: you index the contents before searching. The way you index depends on how you want to search the logs, but in general, you might do well using Lucene (or Solr, if the log entries can easily be restructured into xml documents).

The amount of performance and resource use optimization put into tools like the above should give you orders of magnitude better performance than an ad-hoc solution.

This is all assuming you search each file many times. If this is not the case, you might as well grep the files and be done with it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜