Pcrepp - Perl Regular Expression syntax to match host name [duplicate]

2022-12-22 01:28 问答作者：

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
The Hostname Regex

I'm trying开发者_StackOverflow to use pcrepp (PCRE) to extract hostname from url. the pcre regular expression is as same as Perl 5 regular expression.

for example:

url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".

I can't find the correct syntax of the regex for this example.

Needs to work for any url: amazon.com/sds/ should return: amazon.com. or abebooks.co.uk/isbn="62345627457245"/blabla/ should return abebooks.co.uk
I don't need to check if the url is valid. just to get the hostname.

Something like this:

^(?:[a-z]+://)?[^/]+/?

See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.

Here is one possibility:

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$

And another:

^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$

These and other URL related regular expressions can be found here: Regular Expression Library

string regex1, regex2, finalRegex; 
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";

regex2 = "([^#]+)?#?(\\w*)";

    //concatenation
    finalRegex= regex1+regex2;

the result will be at the sixth place. answered in another question I asked: Details.

继续阅读：parsing pcre perl regex

Pcrepp - Perl Regular Expression syntax to match host name [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？