开发者

Pcrepp - Perl Regular Expression syntax to match host name [duplicate]

This question already has answers here: Closed 12 years ago.

Possible Duplicate:

The Hostname Regex

I'm trying开发者_StackOverflow to use pcrepp (PCRE) to extract hostname from url. the pcre regular expression is as same as Perl 5 regular expression.

for example:

url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".

I can't find the correct syntax of the regex for this example.

  • Needs to work for any url: amazon.com/sds/ should return: amazon.com. or abebooks.co.uk/isbn="62345627457245"/blabla/ should return abebooks.co.uk
  • I don't need to check if the url is valid. just to get the hostname.


Something like this:

^(?:[a-z]+://)?[^/]+/?


See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.


Here is one possibility:

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$

And another:

^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$

These and other URL related regular expressions can be found here: Regular Expression Library


string regex1, regex2, finalRegex; 
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";

regex2 = "([^#]+)?#?(\\w*)";

    //concatenation
    finalRegex= regex1+regex2;

the result will be at the sixth place. answered in another question I asked: Details.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜