开发者

Approximate session data from apache access.log - python

How might one use the ip and timestamp from Apache's access log to approximate a "session" for a given visitor? A session would include all consecutive requests within a given period, say 60secs.

I have a class to parse the log file, and follow an IP address through it (the log is in timestamp order, thankfully). The class creates a tuple of dictionaries, which contain the various log fields and a python datetime object for the access timestamp.

class ApacheLogParser(object):
    def __init__(self, file):
        self.lines = __parse(file)
    def __parse(self, file):
        """ use a regex to parse the file
            return a tuple of dictionaries
        """
    def follow_ip(self, ip):
        """ all entries for a given ip, in order of appearance in the log """
  开发者_C百科      return (line for line in self.lines if re.search(ip, line['ip']))

log = ApacheLogParser('access.log')
for line in log.follow_ip('1.2.3.4'):
    print "%s %s" % (line['path'], line['datetime'].date())

How might I add functionality to this class to be able to iterate through these approximated "sessions"?

Thanks!

EDIT: While forming my edit, I came up with this:

ip = '1.2.3.4'
ipdata = list(log.track_ip(ip))
initial_dt = ipdata[0]['datetime']
sess = [x for x in ipdata if x['datetime'] < initial_dt + datetime.timedelta(0,60)]

It seems to work, do you have any comments?


I wrote you some code then did a fail and lost it =(.

One way, not necessarily the best, is to iterate through the lines, maintaining a dictionary of IP address -> list of lines in its session. For each line, if it's already in the dict just append it to the list; otherwise, make a new session for it. Then, within the loop, check all sessions for expiry (their last element's datetime being over 60 seconds before the current line's); if one has expired, yield it and delete it from the dict.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜