Turn set of urls in to a regex pattern (optional patterns)
Using an arbitrary set of urls (eg: http://api.longurl.org/v开发者_如何学Python2/services) what is the best way to turn this list into a regex?
Is this appropriate regex?
(((easyuri|eepurl|eweri)\.com)|((migre|mke|myloc)\.me)|etc...)'
Can you do multiple levels of optional patterns like that?
I see different ways to accomplish this.
- Use XPath and try to select a node given the current URL.
- Parse the xml into a dictionary and test your current URL if it exists as a key.
- Store the domains of the XML in a database, index the url field and query your current URL.
- If performance is not an issue: Match the current URL against the entire XML file as text.
- Perhaps there are more ideas.
Building a regex from the XML does not seem to me a good idea since all the other solutions appear to me far more easy to develop.
OP'S ANSWER:
Well it turns out that this does work:
/((?:easyuri|eepurl|eweri)\.com)|((?:migre|mke|myloc)\.me)/
Run against this:
easyuri.com eepurl.comer eweri.us migre.me mke.memo myloc.em
You get this:
[0] => Array
(
[0] => easyuri.com
[1] => eepurl.com
[2] => migre.me
[3] => mke.me
)
But the easiest way would just be something like this:
/0rz\.tw|1link\.in|1url\.com|2\.gp|2big\.at|etc\.\.\./
Regex helps you complicate things more than is possible with other methods. ;P
Here's the PHP I eventually used to create the regex:
Assumes that you have cURL'd http://api.longurl.org/v2/services and converted the xml to an array called $urlShorteners
like: $urlShorteners = array('0rz.tw', '1link.in', 'etc...');
foreach($urlShorteners as $url) {
$urls[] = array_reverse(explode('.', $url));
}
foreach($urls as $url) {
$tldKeys[array_shift($url)][] = $url;
}
foreach($tldKeys as $tld => $doms) {
if($tld != '') {
$subPattern = array();
foreach($doms as $subDomain) {
$subPattern[] = implode("\.", array_reverse($subDomain));
}
if (count($subPattern) > 1) $optionPattern[] = "((?:" . implode("|", $subPattern) . ")\." . $tld . ")";
else $optionPattern[] = "(" . $subPattern[0] . "\." . $tld . ")";
}
}
$regex = '/' . implode('|', $optionPattern) . '/';
echo $regex . "\n";
精彩评论