开发者

Python regular expression again - match URL

I have such a regular expression:

 re.compi开发者_如何学Cle(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", re.MULTILINE|re.UNICODE)

But that doesn't include hashbangs (#!). What do I need to change to get it working? I know I can add ! to a group with #@%, etc., but that will select something like

Check this out: http://example.com/something/!!!

And I want to avoid that.


Don't try to make your own regular expression for matching URLs. Use someone else's who has already solved such problems, like this one.


It could be very long but in practice mine works pretty good. Please try this one ((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*

It matches all of the example below

http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
rfordyce@broadviewnet.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass@example.com/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/Christina.V.Scott@gmail.com
line.lundvoll.nilsen@telemed.no.
s.hossain@unsw.edu.au
s.hossain@unsw.edu.au


This is a common problem. Use default libraries.

For Python, use urlparse.


Based on this link, we can use the library validators.

For example:

import validators

valid = validators.url('https://codespeedy.com/')
if valid == True:
    print("URL is valid")
else:
    print("Invalid URL")


I'll admit that I'm a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:

((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)


This is the most complete pattern I use:

URL_PATTERN = r'[A-Za-z0-9]+://[A-Za-z0-9%-_]+(/[A-Za-z0-9%-_])*(#|\\?)[A-Za-z0-9%-_&=]*'


I use this to search for all HTTP and HTTPS URLs. It works like a charm.

URL_PATTERN = "http[s]*\S+"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜