How do I extract info from a block of URLs in php?
I have a list of urls, which can come in any format. One per line, separated by commas, have random text in between them, etc. the URLs are all from 2 different sites, and have a similar structure
For this example, lets say it looks like this
Random Text - http://www.domain2.com/variable-value
Random Text 2 - http://www.domain1.com/variable-value, http://www.domain1.com/variable-value, http://www.domain1.com/variable-value
http://www.domain1.com/variable-value
http://www.domain2.com/variable-value
http://www.domain1.com/variable-value http://www.domain2.com/variable-value http://www.domain1.com/variable-value
I need to extract 2 pieces of information. Check to see if its domain1 or domain2 and the value that follows "variable-"
So it should create a multi-dimensional array, which would have 2 items: domain +开发者_如何学JAVA value.
Whats the best way of doing that?
This is a possiblity of extracting the urls. The only problem is that the urls itself may not contain a comma. So if is enough....
$lines = explode('\n', $urls);
for($i = 0; $i < sizeof($lines); $i++)
{
if(preg_match_all("http:\\/\\/[^,]*variable-([^,]+)", $lines[$i], $matches))
{
}
}
By the way... matches are stored in the $matches
array.
P.S: Edited... i forgot to escape the backslash and you should search the string line for line to ensure a correct behaviour... test the regex at http://www.regex-tester.de/regex.html... it just worked out with my regex.
P.P.S: After further researches i found this page: http://internet.ls-la.net/folklore/url-regexpr.html. It contains the regular expression for a url. You could use it to extract the urls first and in the second step you could go through your urls and extract the variable information looking for e.g. variable-([\W]+)
.
preg_split, preg_match, parse_url
// split urls
$urls = preg_split('!,\s+!', 'http://www.domain1.com/variable-value, http://www.domain2.com/variable-value, http://www.domain3.com/variable-value');
// check for domain and path variable
foreach ($urls as $url) {
$parts = parse_url($url);
// check domain: $parts['host'];
$matches = array();
// check path: preg_match('!^/variable-([^/]+)!', $parts['path'], $matches)
}
$text = "http://www.domain1.com/variable-value1, http://www.domain2.com/variable-value2 http://www.domain1.com/variable-value3";
preg_match_all("/http:\\/\\/(.+?)\\/variable-([a-z0-9]+)/si", $text, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => http://www.domain1.com/variable-value1
[1] => http://www.domain2.com/variable-value2
[2] => http://www.domain1.com/variable-value3
)
[1] => Array
(
[0] => www.domain1.com
[1] => www.domain2.com
[2] => www.domain1.com
)
[2] => Array
(
[0] => value1
[1] => value2
[2] => value3
)
)
精彩评论