Number of visitors per hour
I have a server log formatted like this:
128.33.100.1 2011-03-03 15:25 test.html
I need to extract several things from it but I am mostly stuck on how to get the total number of visit开发者_Python百科s per hour as well as the number of unique visitors per page. Any pointers would be appreciated.
Assuming you have only these lines in your log file. Here is how i think you should go about doing this. (This is assuming database is not involved)
Create a class that represents each line (a model), having IP, Date, time , file
You can add a method on this model which returns a java timestamp based on date time.
Then create an Hash Map which stores file name as keys and list of object of above class as values
Start reading a line at a time.
For each line a. Use StringTokenizer to get IP, Date, time and file as tokens b. Populate the object of above class c. Append this object to the list matching the file name in the hash map. (create new if one doesnt exist)
Now you have all the data in a usable data structure.
To get number of unique visitors for each page: 1. Just retrieve the list corresponding to correct file name form hash map. This you can run a simple algorithm to count number of unique IP addresses. You can also use a Java Collections functionality to do this.
To get number of visits per hour for each page: 1. again retrieve the correct list as above, and fin the min and max time stamp. 2. find out the time in hours. then divide the total entries in the list with hours.
Hope that helps.
If you are splitting the line into an array, I would suggest taking the hour out of the 3rd element and do an a check for all preceding lines the same way from the first time you see the 15 until the first time you see the 16, with a counter store the number of hits in that hour.
Splitting a String can be done like this:
String[] temp;
String str = "firstElement secondElement thirdElement";
String delimiter = " ";
temp = str.split(delimiter); //temp be filled with three elements.
As far as the unique visitors per page go, you can grab the 1st element of the array you used for splitting and putting that value inside a HashMap with the that IP value as the key and the page they visited as a value. Then do a check on the HashMap with every IP that comes in and if its not in the it, insert it and by the end you will have a HashMap filled with unique elements/IPs.
Hope that gives you some help.
Convert the log entries in java.util.Calendar
and then perform your maths on per unique IP addresses.
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
public class Visit
{
public static void main(String[] args) throws Exception
{
String []stats = "128.33.100.1 2011-03-03 15:25 test.html".split("\\s+");
System.out.println("IP Address: " + stats[0]);
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd hh:mm");
Date date = formatter.parse(stats[1]+" "+stats[2]);
Calendar cal = Calendar.getInstance();
cal.setTime(date);
System.out.println("On Date: " + cal.get(Calendar.DATE)+ "/" + cal.get(Calendar.MONTH)+ "/" + cal.get(Calendar.YEAR));
System.out.println("At time: " + cal.get(Calendar.HOUR_OF_DAY)+ ":" + cal.get(Calendar.MINUTE));
System.out.println("Visited page: " + stats[3]);
/*
* You have the Calendar object now perform your maths
*/
}
}
While parsing the line from the log do:
- Allocate a Dictionary (this should be done just once at the start of the program)
- Extract the date time part
- Convert it to DateTime object (.NET) or similar object for your programming language
- Set the minutes and seconds of the date time object to 0
- Put the date time in Dictionary object if it doesn't already exist.
- Increment the value in the dictionary item where Date time is your current parsed date time
- In the end this dictionary will have hourly hits
精彩评论