Regex: Match URLs for specific domain EXCEPT when a certain querystring parameter has a certain value

2022-12-08 22:54 问答作者：

In short, I need to match all URLs in a block of text that are for a certain domain and don't contain a specific querystring parameter and value (refer=twitter)

I have the following regex to match all URLs for the domain.

\b(https?://)?([a-z0-9-]+\.)*example\.com(/[^\s]*)?

I just can't get the last part to work

(?![&?]refer=twitter)\b(https?://)?([a-z0-9-]+\.)*example\.com(/[^\s]*)?

So the following SHOULD match

example.com
http://example.com/
https://www.example.com#link
www.example.com?somevalue=foo

But these should NOT

https://www.anotherexample.com#link
www.example.com?refer=twitter

EDIT: And if you can get it to match the

http://example.com?foo=foo.bar

out of a sentence like

For examples go to http://example.com?foo=foo.bar.

without picking up 开发者_StackOverflowthe period, that would be great!

EDIT2: Fixed the trailing period issue with this

\b(https?://)?([a-z0-9-]+\.)*example\.com/?([^\s]*[^.])?

EDIT3: This seems to work, or at least 99% of the tests I've thrown at it

(?!\b.*[&?]refer=twitter)\b(https?://)?([a-z0-9-]+\.)*example\.com/?([^\s]*[^.])?

EDIT4: Settled on

\b(?!.*[&?]refer=twitter)(https?://)?([a-z0-9-]+\.)*nygard\.com(?!\.)[^\s]*\b+

(?!\b.*[&?]refer=twitter)

Is what you're looking for.

To be honest, at first the thought of using a regex didn't even cross my mind (which is a good sign - using a regex must, IMO, always be a secondary option, not primary). Here is how I'd do it in my language of choice

>>> from urlparse import urlparse, parse_qs
>>> p = urlparse(r'http://foo.bar.com/baz?refer=twitter&rock=paper')
>>> parse_qs(p.query)
{'rock': ['paper'], 'refer': ['twitter']}

You can do anything from here.

继续阅读：regex

Regex: Match URLs for specific domain EXCEPT when a certain querystring parameter has a certain value

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？