开发者

Anything but subexpressions

I am trying to make a regex to identify relative src paths using PHP. To do this my idea was to use a look ahead (?= then not ^ and a subexpression (http) but this doesn't work. It works for a single charater but the ^ doesn't work with a subexpression. Is there an && operator or something?

 <img.*?src=[\'\"]\(?=^(http))

I need it to take the entire http or else imgs with sta开发者_开发问答rting with h, t or p will be prejudiced against. Any suggestions? Is this task too big for regex?


You can use negative lookahead, which is (?!...) instead of (?=...). For your example (I'd put the anchor at the start):

^(?!http)

Which reads: start of string, then something which is not "http".

Edit: since you updated with a fuller example:

<img [^>]*src=['"](?!http)([^'"]+)['"]

                          ^------^ - this capturing group captures the link
                                     which doesn't start with http

Of course, for proper parsing you should use DOM ;)


It's not the most useful answer, but it sounds as though you've reached the limit of applicabiliy for Regex in HTML parsing.

As per this answer here look at using a HTML DOM Parser. I haevn't used PHP DOM Parser's much, but I know in other languages, a DOM parser often makes HTML tasks a 30 second job, rather than an hour or more of weird exceptional case testing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜