Matching Multiple Patterns using Java Regex
I have a file containing records of the following format:
1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css
Which has 11 fields ([02/Oct/2010:00:00:38 +0530]
is a single field)
I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.
Can regex be used to match multiple patter开发者_JAVA技巧ns for the above?
From the above record, I need to extract the fields
f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css
f2: 02/Oct/2010:00:00:38 +0530
f3: je02121
Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):
String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}
Output:
Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'
Regex Pattern explained:
\\[ match an opening square brace
.*? and anything up to a
\\] closing square brace
| or
\\S+ any sequence of multiple non-whitespace characters
Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:
Pattern regex = Pattern.compile(
"^(?:\\S+\\s+){6} # first 6 fields\n" +
"(\\S+)\\s+ # field 7\n" +
"\\[([^]]+)\\]\\s+ # field 8\n" +
"(\\S+) # field 9",
Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
use split with regex "[\t\s]+?" and store results in array say s.
Then s[6], s[7]+s[8] and s[9] will be the expected result
This option not include opening and closing braces ([]) in output
String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);
精彩评论