开发者

Java regular expression for extracting the data between tags

I am trying to a regular expression which extracs the data from a string like

<B Att="text">Test</B><C>Test1</C>

The extracted output needs to be Test and Test1. This is what I have done till now:

public class HelloWorld {
    public static void main(String[] args)
    {
        String s = "<B>Test</B>";
        String reg = "<.*?>(.*)<\\/.*?>";
        Pattern p = Pattern.compil开发者_如何学Goe(reg);
        Matcher m = p.matcher(s);
        while(m.find())
        {
            String s1 = m.group();
            System.out.println(s1);
        }
    }
}

But this is producing the result <B>Test</B>. Can anybody point out what I am doing wrong?


Three problems:

  • Your test string is incorrect.
  • You need a non-greedy modifier in the group.
  • You need to specify which group you want (group 1).

Try this:

String s = "<B Att=\"text\">Test</B><C>Test1</C>"; // <-- Fix 1
String reg = "<.*?>(.*?)</.*?>";                   // <-- Fix 2
// ...
String s1 = m.group(1);                            // <-- Fix 3

You also don't need to escape a forward slash, so I removed that.

See it running on ideone.

(Also, don't use regular expressions to parse HTML - use an HTML parser.)


If u are using eclipse there is nice plugin that will help you check your regular expression without writing any class to check it. Here is link: http://regex-util.sourceforge.net/update/ You will need to show view by choosing Window -> Show View -> Other, and than Regex Util

I hope it will help you fighting with regular expressions


It almost looks like you're trying to use regex on XML and/or HTML. I'd suggest not using regex and instead creating a parser or lexer to handle this type of arrangement.


I think the bestway to handle and get value of XML nodes is just treating it as an XML.

If you really want to stick to regex try:

<B[^>]*>(.+?)</B\s*>

understanding that you will get always the value of B tag.

Or if you want the value of any tag you will be using something like:

<.*?>(.*?)</.*?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜