Format list of urls in mysql
I have a list of a million or urls in an mysql table.
开发者_如何学编程I need to cleanse the data (extract domains) so I can be confident about DISTINCT type queries.
Data is in several different types: -
www.domain.tld
domain.tld
http://domain.tld
https://vhost.domain.tld
domain.tld/
There are invalid domains and empty data.
Ideally I'd like to do something along the lines of : -
UPDATE table1 SET domain = website REGEXP '^(https?://)?[a-zA-Z0-9\\\\.\\\\-]+(/|$|\\\\?)'
domain being a new empty field, website being the original url.
You can't use regex like that in MySQL as is, but apparently you can some some UDFs that implement it. See:
- How to do a regular expression replace in MySQL?
- https://launchpad.net/mysql-udf-regexp
- http://www.mysqludf.org/lib_mysqludf_preg/
精彩评论