I am currently trying to write a program that will extract the most occurring ip address from a txt file and display that ip and number of occurrences
example of information in file: 172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
the file contains a list of information as shown in example. I have to read the file into python and extract the ip and display the most occurring one and how many times it occurs.
from statistics import mode
def getinput():
d = {}
file = open("sample1.txt")
for x in file:
f = x.split(" ")
d.update({f[0].strip(): f[0].strip()})
ret开发者_Python百科urn d
def counter(d):
count = mode(d)
occurences = 0
for i in d:
if i == mode:
occurences = occurences + 1
return count,occurences
def display(count,occurences):
print(count)
print(occurences)
d=getinput()
count,occurences=counter(d)
display(count,occurences)
this is what I have done so far, however using mode it only displays the first IP in the list and the occurrences doesn't seem to count as it is only displaying "0".
Python offers a counter already Counter
You could try to use an iterator, to avoid having to create an intermediate datastructure, this helps specially if there are many repeated values.
import re
from collections import Counter
def get_ips(fname):
// a pattern to match IPv4
ip_re = re.compile('^\s*(\d+\.\d+\.\d+\.\d+)')
with open(fname) as file
for x in file:
# extract the IP from the line
# ignore if it does not have an IP
ip_match = ip_re.search()
if ip_match is not None
# group(1) is the pattern in parethesis, the ip.
yield ip_match.group(1)
ips = Counter(get_ips("sample1.txt"))
ips.most_frequent(10) # gets the 10 more frequent IPs
You could do something like this:
- Use regex to search for
IP addresses
in text file and append toip_list
- Identify the unique
IP addresses
- Calculate the number of times each
IP address
is found - Display the results
Code:
import re
ip_list = []
with open('sample1.txt') as f:
for line in f.readlines():
ip_list.append(re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', line).group())
# Get unique items in the ip_list
unique_ip_list = set(ip_list)
ip_counts = {}
# Find out the the counts for each IP address:
for ip in unique_ip_list:
ip_counts[ip] = ip_list.count(ip)
print(ip_counts)
print()
print(f"Most common IP address: {max(ip_counts, key=ip_counts.get)} with {max(ip_counts.values())} times")
OUTPUT:
{'172.16.121.170': 2, '172.16.121.172': 3, '172.16.121.171': 1}
Most common IP address: 172.16.121.172 with 3 times
Tested with the following file:
Sample1.txt
172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.170 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.172 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
172.16.121.171 - - [03/Sep/2018:09:35:32] GET /index.html HTTP/1.1 200 437
Here is a test input file just listing the IP addresses. You will need to strip lines of logs or whatever to get a listing of just the IP addresses.
text.txt:
192.168.0.34 192.168.0.13 192.168.0.45 192.168.0.34 192.168.0.62 192.168.0.34 192.168.0.13 192.168.0.13 192.168.0.62 192.168.0.13 192.168.0.45 192.168.0.62 192.168.0.45 192.168.0.13 192.168.0.65 192.168.0.45 192.168.0.10 192.168.0.45 192.168.0.7 192.168.0.45 192.168.0.92 192.168.0.45 192.168.0.12 192.168.0.45 192.168.0.14 192.168.0.45 192.168.0.32
Here is the Python code with comments:
from collections import OrderedDict ip_occurrences = OrderedDict() # open your file with open('text.txt', 'r') as f: # read all the IP addresses in to a set ip_addresses = f.readlines() # loop through the set of IP addresses for ip in ip_addresses: # my text file had \n codes that needed to be filtered # we will need to remember this when we reference ip_addresses # for lookups as it is not filtered clean_ip = ip.replace('\n', '') # We check our clean IP address to see if it already exists. # We only want to add new ip addresses if clean_ip not in ip_occurrences.keys(): # create a new key with the clean_ip name with the count of # the occurnces (not clean) in ip_addresses ip_occurrences[clean_ip] = ip_addresses.count(ip) # winner winner chicken dinner! this is the IP address that occurred the most most_freq_ip = max(ip_occurrences, key=ip_occurrences.get) # display it however you see fit. I added ip_occurrences[most_freq_ip] to # show what the count is print( f'{most_freq_ip} occurred {ip_occurrences[most_freq_ip]} times')
Produces this output:
192.168.0.45 occurred 9 times
精彩评论