How to approach IIS log parsing in a Pythonic way?

2023-03-16 07:51 问答作者：

Ok, so I have some IIS logs that I would like to parse with Python (which I am fairly new to atm). A sample of IIS log looks like this:

#Software: Microsoft Internet Information Server 6.0 
#Version: 1.0 
#Date: 1998-11-19 22:48:39 
#Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-version cs(User-Agent) cs(Cookie) cs(Referrer) 

1998-11-19 22:48:39 206.175.82.5 - 208.201.133.173 GET /global/images/navlineboards.gif - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net
1998-11-20 22:55:39 206.175.82.8 - 208.201.133.173 GET /global/something.pdf - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net

There are only 2 lines of log data here, where I have thousands per log.. So, this is just a short example.

From this logs I would like to extract data like - counts of client IP addresses that made the most conn开发者_高级运维ections, counts of files that were downloaded the most, number of URIs that were visited the most, etc... Basically what I want is to get some statistics... For example, as a result I would like to see something like this:

file download_count
example1.pdf 9
example2.pdf 6
example3.doc 2

IP file hits
192.168.1.5 /sample/example1.gif 8
192.168.1.9 /files/example2.gif 8

What I am not sure is how to approach this in a pythonic way. At first I thought I would split each line of the log and make a list out of it, and append each one to a bigger list (I see it as a 2d array). Then I got to the phase of extracting statistics from that big list, and now I think it would maybe be better to make a dictionary out of all that data and count stuff by dict keys and dict values? Is that a better approach than using lists? If I should better use lists, how should I approach it that way? What do I google, what do I look for?

So I am looking for ideas on how this is usually supposed to be done. Thanks.

assuming that skip_header(file) returns only the log lines from the file and that parse(line) extracts the (ip, path) from the line:

from collections import defaultdict
first = defaultdict(int)
second = defaultdict(lambda: defaultdict(int))
for line in skip_header(file):
    ip, path = parse(line)
    first[path] += 1
    second[ip][path] += 1

for the first

print "path count"
for path, count in first.iteritems():
    print "%s %d" % (path, count)

for second:

print "ip path count"
for ip,d in second.iteritems():
     for path, count in d.iteritems():
         print "%s %s %d" % (ip, path, count)

继续阅读：parsing python

How to approach IIS log parsing in a Pythonic way?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？