Is there an open source tool to automatically find patterns in logfiles? [closed]
I've been working on a clustered system for many years, and decided it is time we had a tool that let us query the plain-text logfiles (among other things) easily. I downloaded all the logfiles to an old test machine, where they take about 20 GB compressed, but would take 550 GB uncompressed (partly due to many stack traces). We have different "topics" maintained by different people, and our log formats changed over the years. But let's just assume I could somehow turn it into a single consistent format across all topics.
My question is: Is there some free/open source tool that I can just let loose on those files, and it will automatically recognize recurring similar log messages. As an example message:
User John Smith has logged in from IP aaa.bbb.ccc.ddd. Duration: zzz ms.
Given many instances of such message, the tool would work out a pattern like:
User * has logged in from IP *. Duration: * ms.
Where * is a placeholder for varying data. Once we have those patterns (which would need to be updated regularly), we could match each new message to the patterns, and and build useful statistics.
Ideally the tool would be Java, or Python or Perl, as we use those, and we are in a mixed Windows/Linux environment.
This might also be an option: Grok, automatic log pattern discovery in Python
精彩评论