开发者

Regex conditional

How would I write a RegEx to:

Find a match where the f开发者_StackOverflow社区irst instance of a > character is before the first instance of a < character.

(I am looking for bad HTML where the closing > initially in a line has no opening <.)


It's a pretty bad idea to try to parse html with regex, or even try to detect broken html with a regex.

What happens when there is a linebreak so that the > character is the first character on the line for example (valid html).

You might get some mileage from reading the answers to this question also: RegEx match open tags except XHTML self-contained tags


Would this work?

string =~ /^[^<]*>/

This should start at the beginning of the line, look for all characters that aren't an open '<' and then match if it finds a close '>' tag.


^[^<>]*>

if you need the corresponding < as well,

^[^<>]*>[^<]*<

If there is a possibility of tags before the first >,

^[^<>]*(?:<[^<>]+>[^<>]*)*>

Note that it can give false positives, e.g.

<!-- > -->

is a valid HTML, but the RegEx will complain.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜