Parsing Jetty log records
For the given input example:
70.80.110.200 - - [12/Apr/2011:05:47:34 +0000] "GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1" 302 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident开发者_开发技巧/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)" 4 4
I would like to define the following parsing logic (probably regex)
- Extract the IP (3 digits, dot) * 4 => 70.80.110.200
- Extract the date => 12/Apr/2011
- Extract the time => 05:47:34
- Extract the URI (starts with \" and ends with \"). => /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000
Try with:
/^([0-9.]+).*?\[(\d+\/\w+\/\d+):(\d+:\d+:\d+).*?\].*?(\/[^ ]*).*$/
As you expect, in following groups (1, 2, 3, 4) you will get all data you specified - for example .group(3)
is time.
Ensure Jetty is configured to do NSCA-compatible logging, then you can use any NCSA log analyzer to analyze the logs.
If you want to do it by hand, then this is a nice usecase for regular expressions.
Complete code sample (based on hsz's answer):
import java.util.*;
import java.util.regex.*;
public class RegexDemo {
public static void main( String[] argv ) {
String pat = "^([0-9.]*).*?\\[(\\d+\\/\\w+\\/\\d+):(\\d+:\\d+:\\d+).*?\\].*?(\\/[^ ]*).*$";
Pattern p = Pattern.compile(pat);
String target = "70.80.110.200 - - [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
Matcher m = p.matcher(target);
System.out.println("pattern: " + pat);
System.out.println("target: " + target);
if (m.matches()) {
System.out.println("found");
for (int i=0; i <= m.groupCount(); ++i) {
System.out.println(m.group(i));
}
}
}
}
You can try the following:
String s = "70.80.110.200 - - [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
Pattern p = Pattern.compile("^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?\\" + //ip
"[([^:]*):"+ //date
"(\\d{2}:\\d{2}:\\d{2}).*?\\].*?"+ //time
"(/[^\\s]*).*$"); //uri
Matcher m = p.matcher(s);
if(m.find()){
String ip = m.group(1);
String date = m.group(2);
String time = m.group(3);
String uri = m.group(4);
}
精彩评论