开发者

correction required for regular expression to get site name

Problem: Extraction anything between http://www. and .com OR http:// & .com.

Solution:

<?php
$url1='http://www.examplehotel.com';
//$url2='http://test-hotel-1.com';
$pattern='@^http://([^/]+).com@i';
preg_match($pattern, $url1, $matches);
print_r($matches);
?>

When $url1 is matched it should return string 'examplehotel'

when $url2 is matched it should return string 'tes开发者_运维知识库t-hotel-1'

It works correctly for $url2 but empty for $url1....

In my pattern I want to add [http://] or [http://www.] I added (http://)+(www.)+ but the match returns are not expected :(.

May I know where I am going wrong?


try this one:

$pattern='@^http://(?:www\.)?([^\.]+).com@i';

or in your pattern you just need to make www optional (may or may not appear in pattern):

$pattern='@^http://(?:www\.)?([^/]+).com@i';


The problem is, that you are matching everything from the two slashes to the .com. If there is a www. you are matching this too, within your capturing group.

The solution is to match www. optionally before your capturing group, like this

^http://(?:www\.)?([^/]+)\.com
        ^^^^^^^^^^       ^^

(?:www\.)? This is a non capturing group, i.e. the content is not stored in the result. The ? at the end makes it optional.

\. will match a literal ".". . is a special character in regex and means "Any character".

See it here online on Regexr, When you hover your mouse over the strings, you will see the content of the capturing group.

Regarding your tries with [http://] and so on. When you use square brackets, then you are creating a character class, that means match one of the characters from inside the brackets. When you want to group the characters, then use a capturing () or a non capturing (?:) group.


preg_match_all('%http(?:s)?://(?:www\.)?(.*?)\.com%i', $url, $result, PREG_PATTERN_ORDER);
print_r($result[1])
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜