开发者

Need regular expr. for html element where order of attributes doesn´t matter

I need a regular expression to detect a span-element where the order of id and class doesn´t matter. The name of the class is always the same, the id is alw开发者_Go百科ays a fixed number of digits, for example:

<span class="className" id="123">

or

<span id="321" class="className" >

My approach for a regular expression in java was:

String pattern = "<span class=\"className\" id=\"\\d*\">";

but so i can get only one version. Can sombody help?

Thanks, hansa


Don't parse HTML with regular expressions. HTML isn't regular.


This should do it:

String r = "<span (?=[^<>]*\\bclass=\"className\")[^<>]*\\bid=\"(\\d+)\"[^<>]*>";

The lookahead confirms that the span is of the desired class without consuming any characters. Then the rest of the regex, starting from the same position, searches for the id attribute and captures its value. The [^<>]* takes care of any other attributes that might be present, while ensuring that all matching occurs within the tag. (Technically, angle brackets can appear in attribute values, but you probably don't have to worry about that.)


I would do a two step version, first finding the span tag with:

<span[^>]*class=\"classname\"[^>]*>

And then dig out the id from the tags that match the first pattern with

id=\"(\d+)\"

As others have pointed out, it's not a good idea to parse HTML with regular expressions. But for dirty data processing, this is how i would do it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜