Java regular expression for extracting the data between tags
I am trying to a regular expression which extracs the data from a string like
<B Att="text">Test</B><C>Test1</C>
The extracted output needs to be Test and Test1. This is what I have done till now:
public class HelloWorld {
public static void main(String[] args)
{
String s = "<B>Test</B>";
String reg = "<.*?>(.*)<\\/.*?>";
Pattern p = Pattern.compil开发者_如何学Goe(reg);
Matcher m = p.matcher(s);
while(m.find())
{
String s1 = m.group();
System.out.println(s1);
}
}
}
But this is producing the result <B>Test</B>
. Can anybody point out what I am doing wrong?
Three problems:
- Your test string is incorrect.
- You need a non-greedy modifier in the group.
- You need to specify which group you want (group 1).
Try this:
String s = "<B Att=\"text\">Test</B><C>Test1</C>"; // <-- Fix 1
String reg = "<.*?>(.*?)</.*?>"; // <-- Fix 2
// ...
String s1 = m.group(1); // <-- Fix 3
You also don't need to escape a forward slash, so I removed that.
See it running on ideone.
(Also, don't use regular expressions to parse HTML - use an HTML parser.)
If u are using eclipse there is nice plugin that will help you check your regular expression without writing any class to check it. Here is link: http://regex-util.sourceforge.net/update/ You will need to show view by choosing Window -> Show View -> Other, and than Regex Util
I hope it will help you fighting with regular expressions
It almost looks like you're trying to use regex on XML and/or HTML. I'd suggest not using regex and instead creating a parser or lexer to handle this type of arrangement.
I think the bestway to handle and get value of XML
nodes is just treating it as an XML
.
If you really want to stick to regex
try:
<B[^>]*>(.+?)</B\s*>
understanding that you will get always the value of B
tag.
Or if you want the value of any tag you will be using something like:
<.*?>(.*?)</.*?>
精彩评论