开发者

Matching Multiple Patterns using Java Regex

I have a file containing records of the following format:

1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css

Which has 11 fields ([02/Oct/2010:00:00:38 +0530] is a single field)

I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.

Can regex be used to match multiple patter开发者_JAVA技巧ns for the above?

From the above record, I need to extract the fields

f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css  
f2: 02/Oct/2010:00:00:38 +0530  
f3: je02121


Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):

String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
    System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}

Output:

Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'

Regex Pattern explained:

\\[    match an opening square brace
.*?    and anything up to a
\\]    closing square brace
|      or
\\S+   any sequence of multiple non-whitespace characters


Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:

Pattern regex = Pattern.compile(
    "^(?:\\S+\\s+){6}   # first 6 fields\n" +
    "(\\S+)\\s+         # field 7\n" +
    "\\[([^]]+)\\]\\s+  # field 8\n" +
    "(\\S+)             # field 9", 
    Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    for (int i = 1; i <= regexMatcher.groupCount(); i++) {
        // matched text: regexMatcher.group(i)
        // match start: regexMatcher.start(i)
        // match end: regexMatcher.end(i)
    }
} 


use split with regex "[\t\s]+?" and store results in array say s.

Then s[6], s[7]+s[8] and s[9] will be the expected result


This option not include opening and closing braces ([]) in output

    String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
    Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜