How to do Log Mining?

2023-02-16 07:57 问答作者：

In order to figure out (or guess) something from one of our proprietary desktop tools developed by wxPython, I injected a logging decorator on several regardful class methods. Each log record looks like the following:

Right now, there are more than 3M log records in database and I started to think "What I can get from those stuff?". I can get some information like:

hit rate of (klass, method) by a period of time (ex, a week).
power users by record counts.
approximate crash rate by lost closing log compared to opening log.

I guess the related technique might be log mining. Does anyone have any idea for further information I can retrieve from thi开发者_StackOverflow社区s really simple log? I'm really interested to get something more from it.

SpliFF is right, you'll have to decide which questions are important to you and then figure out if you're collecting the right data to answer them. Making sense of this sort of operational data can be very valuable.

You probably want to start by seeing if you can answer some basic questions, and then move on to the tougher stuff once you have your log collection and analysis workflow established. Some longer-term questions you might consider:

What are the most common, severe bugs being encountered "in the wild", ranked by frequency and impact. Data: Capture stacktraces / callpoints and method arguments if possible.
Can you simplify some of the common actions your users perform? If X is the most common, can the number of steps be reduced or can individual steps be simplified? Data: Sessions, clickstreams for the common workflows. Features ranked by frequency of use, number and complexity of steps.
Some features may be confusing, have conflicting options, which lead to user mistakes. Sessions where the user backs up several times to repeat a step, or starts over from the beginning, may be telling.

You may also want to notify users that data is being collected for quality purposes, and even solicit some feedback from within the app's interface.

Patterns!

Patterns preceding failures. Say a failure was logged, now consider exploring these questions:

What was the sequence of klass-method combos that preceded it?
What about other combos?
Is it always the same sequence that precedes the same failures?
Does a sequence of minor failures precede a major failure?
etc

One way to compare patterns can be as such:

Classify each message
Represent each class/type with a unique ID, so you now have a sequence of IDs
Slice the sequence into time periods to compare
Compare the slices (arrays of IDs) with a diff algorithm
Retain samples of periods to establish the common patterns, then compare new samples for the same periods to establish a degree of anomaly

继续阅读：data-mining logging python

How to do Log Mining?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？