开发者

How can I capture a multiline pattern using a regular expressions in java?

I have a text file that I need to parse using regular expressions. The text that I need to capture is in multiline groups like this:

truck
zDoug
Doug's house
(123) 456-7890
Edoug@doug.com
30
61234.56
8/10/2003

vehicle
eRob
Rob's house
(987) 654-3210
Frob@rob.com

For this example I need to capture truck followed by the next seven lines.In other words, in this "block" I have 8 groups. This is what I've tried but it will not capture the next line:

(truck)\n(\w).

NOTE: I'm using the program RegExr to test my开发者_StackOverflow regex before I port it to Java.


(?m)^truck(?:(?:\r\n|[\r\n]).+$)*

This assumes the whole text has been read into a single string (i.e., you're not reading a file line-by-line), but it doesn't assume the line separator is always \n, as your code does. At the minimum you should allow for \r\n and \r as well, which is what (?:\r\n|[\r\n]) does. But it still matches only one separator, so the match stops before the double line separator at the end of the block.

Once you've matched a block of data, you can split it on the line separators to get the individual lines. Here's an example:

Pattern p0 = Pattern.compile("(?m)^truck(?:(?:\r\n|[\r\n]).+$)*");
Matcher m = p0.matcher(data);
while (m.find())
{
  String fullMatch = m.group();
  int n = 0;
  for (String s : fullMatch.split("\r\n|[\r\n]"))
  {
    System.out.printf("line %d: %s%n", n++, s);
  }
}

output:

line 0: truck
line 1: zDoug
line 2: Doug's house
line 3: (123) 456-7890
line 4: Edoug@doug.com
line 5: 30
line 6: 61234.56
line 7: 8/10/2003

I'm also assuming each line of data contains at least one character, and that the blank lines between data block are really empty--i.e., no spaces, TABs, or other invisible characters.

(BTW: To test that regex in RegExr, remove the (?m) and check the multiline box instead. RegExr is powered by ActionScript, so the rules are a little different. For a Java-powered regex tester, check out RegexPlanet.)


this pattern should work ((.*|\n)*)


I think that in order to span multiple lines your Pattern should be compiled in DOTALL mode, something like

Pattern p = Pattern.compile("truck\\n(.*\\n){7}", Pattern.DOTALL);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜