Extracting Data form String using Java/regex
I am trying to extract data from this String:
Hello there. Blah blahblah blah Building 016814 - Door 01002 BlahBLAHblah DUHHH 78787 blah, Blah blah Building Dr 4647 8989 BLAHBlah blah blahBlah
I am looking to loop through the String and pull each instance of Building and Door number and output to console. However, since both instances of Building and Door number are different form one another, I know that I will need to use two different Regex patterns.
Here is my code:
public static void main(String agrs[]) {
String myStr = "Hello there. Blah blahblah blah Building 016814 - Door 01002"+
" BlahBLAHblah DUHHH 78787 blah, Blah blah Building Dr 4647 8989 BLAHBlah blah blahBlah";
Pattern p = Pattern.compile("Building.+?(?:[Dd]).+?(\\d+).+?(\\d+)");
Pattern p1 = Pattern.compile("Building.+?(\\d+).+?(?:[Dd]).+?(\\d+)");
Matcher m = p.matcher(myStr);
Matcher m1 = p1.matcher(myStr);
while(m1.find() && m.find()) {
System.out.print(" Building " + m1.group(1) + " " + "Door ");
System.out.print(m1.group(2));
System.out.print(" Building " + m.group(1)+" "+ "Door "+m.group(2));
}
And here is my output:
Building 016814 Door 01002 Building 01002 Door 78787
I know it has something to do with my p
regex pattern. It seems to be pulling any numbers in between. I am a newbie to regex so let me know if you need more开发者_开发问答 info about this. Any help will be much appreciated.
I believe I've figured out the answer to my own question. Thank you all so much for your input; much appreciated.
I used:
Building[ ][Dd].+?(\\d+).+?(\\d+)
and my output was:
Building 016814 Door 01002 Building 4647 Door 8989
Your (.+?)
parts are too broad. Try this:
"\\b((?:Building|Door|Dr)\\s\\d+)\\b"
Then just grab what's in the captures from group 1. Make sure you turn off case-sensitive matching if you don't want that.
I'm guessing at the results you want here. You may actually be looking for this instead:
"\\b(Building\\s\\d+)\\s(Door\\s\\d+)\\b"
Edit: Based on your comments, the simplest way I can think of is this:
"\\bBuilding\\s(?:(\\d+)\\sDoor\\s(\\d+)|Dr\\s(\\d+)\\s(\\d+))\\b"
Removing the doubled backslashes for clarity:
/\bBuilding\s(?:(\d+)\sDoor\s(\d+)|Dr\s(\d+)\s(\d+))\b/
精彩评论