Extracting top-level domain names from list of website addresses
I have a list of web addresses such as listed below in my DB.
I need to get the domain name from each address in the list.
http://en.wordpress.com/tag/1000-things-we-hate/
http://en.wordpress.com/tag/1019/
http://en.wordpress.com/tag/1030-am/
http://ww开发者_开发问答w.yahoo.com/index.html
http://www.msn.com/index.html
Here's a way to do it in Java:
String input = "http://en.wordpress.com/tag/1000-things-we-hate/";
// Assuming that all urls start with "http://"
int finish = input.indexOf("/", 7);
if(finish == -1)
{
finish = input.length();
}
System.out.println(input.substring(7, finish));
Prints en.wordpress.com
(I assume that is what you want?)
<?php
$url = "http://en.wordpress.com/tag/1000-things-we-hate/";
$bits = explode("/",$url);
$nextBits = explode(".",$bits[1]);
$count = count($nextBits);
$domain = $nextBits[$count-1].".".$nextBits[$count];
echo $domain;
?>
<?php
echo parse_url($url, PHP_URL_HOST);
That would return "en.wordpress.com". If you don't want subdomains (i.e. only "wordpress.com), then things are getting complicated. You would need something like http://www.dkim-reputation.org/regdom-libs/
Use the parse_url in PHP.
精彩评论