Regex for dropping http:// and www. from URLs
I have a bunch of urls like these.
  $urls = array(
    'https://site1.com',
    'https://www.site2.com',
    'http://www.site3.com',
    'https://site4.com',
    'site5.com',
    'www.site6.com',
    'www.site7.co.uk',
    'site8.tk'
  );
I wanted to remove the http, https, :// and www. from these strings so that the output will look like these.
  $urls 开发者_StackOverflow社区= array(
    'site1.com',
    'site2.com',
    'site3.com',
    'site4.com',
    'site5.com',
    'site6.com',
    'site7.co.uk',
    'site8.tk'
  );
I came up with this solution.
foreach ($urls as $url) {
   $pattern = '/(http[s]?:\/\/)?(www\.)?/i';
   $replace = "";
   echo "before: $url after: ".preg_replace('/\/$/', '', preg_replace($pattern, $replace, $url))."\n";
}
I was wondering how I could avoid the second preg_replace. Any ideas?
preg_replace can also take an array, so you don't even need the loop. You can do this with a one liner:
$urls = preg_replace('/(?:https?:\/\/)?(?:www\.)?(.*)\/?$/i', '$1', $urls);
/^(https?:\/\/)?(www\.)?(.*)\/$/i
And use what's on $3. Or, even better, change the first two parentheses to the non-capturing version (?:) and use what's on 1.
Short and sweet:
$urls = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urls);
Depending on what exactly it is you want to do, it might be better to stick with PHP's own URL parsing facilities, namely parse_url:
foreach ($urls as &$url) {
    $url = preg_replace('~^www.~', '', parse_url($url, PHP_URL_HOST));
}
unset($url);
parse_url will give you the host of the URL, even if it will contain a port number or HTTP authentication data. (Whether this is what you need, depends on your exact use case though.)
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论