开发者

How we can find domain name using MySQL and regular expression

i am having some list of domains in开发者_如何学运维 the DB,like

http://www.masn.com/index.html

http://www.123musiq.com/index.html etc

what i need as out put is

http://www.masn.com

http://www.123musiq.com

how can i do that in regular expression???


In MySQL, regular expressions can match but not return substrings.

You can use SUBSTRING_INDEX:

SELECT  SUBSTRING_INDEX('www.example.com', '/', 1)

, however, it's not protocol prefix safe.

If you are using a mix of prefixed and unprefixed URL's, use this:

SELECT  url RLIKE '^http://',
        CASE
        WHEN url RLIKE '^http://' THEN
                SUBSTRING_INDEX(SUBSTRING_INDEX(url, '/', 3), '/', -1)
        ELSE
                SUBSTRING_INDEX(url, '/', 1)
        END
FROM    (
        SELECT   'www.example.com/test/test' AS url
        UNION ALL
        SELECT   'http://www.example.com/test'
        ) q


use substring_index

http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring-index

like

SELECT  SUBSTRING_INDEX(urlfield, '/', 1) from mytable


SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('http://www.domain.com/', '://', -1),'/', 1);

Result: www.domain.com

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX('http://www.domain.com/', '://', -1),'/',1),'www.', -1);

Result: domain.com


Based on these answers, I came up with a similar solution, but it requires multiple queries.

SELECT SUBSTRING_INDEX(url,'/',1) FROM table WHERE url NOT REGEXP '^[^:]+://';
SELECT SUBSTRING_INDEX(url,'/',3) FROM table WHERE url REGEXP '^[^:]+://';

The first query handles URLs without a protocol prefix. The second query handles URLs with a protocol prefix. Please note that these do not handle every valid URL, but should handle most proper URLs.


If you're not afraid of installing MySQL extensions (UDFs), then there's a UDF you can use that does exactly this while respecting different top-level domains like "google.com" and "google.co.uk", and handles a whole ton of other edge cases

https://github.com/StirlingMarketingGroup/mysql-get-etld-p1

select`get_etld_p1`('http://a.very.complex-domain.co.uk:8080/foo/bar');-- 'complex-domain.co.uk'
select`get_etld_p1`('https://www.bbc.co.uk/');-- 'bbc.co.uk'
select`get_etld_p1`('https://github.com/StirlingMarketingGroup/');-- 'github.com'
select`get_etld_p1`('https://localhost:10000/index');-- 'localhost'
select`get_etld_p1`('android-app://com.google.android.gm');-- 'com.google.android.gm'
select`get_etld_p1`('example.test.domain.com');-- 'domain.com'
select`get_etld_p1`('postgres://user:pass@host.com:5432/path?k=v#f');-- 'host.com'
select`get_etld_p1`('exzvk.omsk.so-ups.ru');-- 'so-ups.ru'
select`get_etld_p1`('http://10.64.3.5/data_check/index.php?r=index/rawdatacheck');-- '10.64.3.5'
select`get_etld_p1`('not a domain');-- null


I had a similar problem but some of the data had query parameters without a slash.

SUBSTRING_INDEX(SUBSTRING_INDEX(urlfield, '/', 3) , '?', 1)

This worked for me and kept https:// and http:// as I needed the URL schemes to be correct.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜