PHP & Regex : Adding website url to images
I have the following code below on my website. It's used to find the images in a block of html that don't have http:// or / in front. If this is the case, it will add the website url to the front of the image source.
For example:
<img src="http://domain.com/image.jpg"> will开发者_Python百科 stay the same
<img src="/image.jpg"> will stay the same
<img src="image.jpg"> will be changed to <img src="http://domain.com/image.jpg">
I feel my code is really inefficient... Any ideas on how I could make it run with less code?
preg_match_all('/<img[\s]+[^>]*src\s*=\s*[\"\']?([^\'\" >]+)[\'\" >]/i', $content_text, $matches);
if (isset($matches[1])) {
foreach($matches[1] AS $link) {
if (!preg_match("/^(https?|ftp)\:\/\//sie", $link) && !preg_match("/^\//sie", $link)) {
$full_link = get_option('siteurl') . '/' . $link;
$content_text = str_replace($link, $full_link, $content_text);
}
}
}
For a start you could stop using regular expressions to process HTML, particularly when what you're doing is so easily done with an HTML parser (of which PHP has at least 3). For example:
$dom = new DomDocoument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$src = $image->getAttribute('src');
$url = parse_url($src);
$image->setAttribute('src', http_build_url('http://www.example.com', $url);
}
$html = $dom->saveHTML();
Problem solved. Well, almost. The case where you add the hostname to relative URLs but not to those beginning with / is a little puzzling and not handled in this snippet but it's a relatively minor change (it involves checking $url['path']
).
See Parse HTML With PHP And DOM, the Document Object Model, parse_url()
and http_build_url()
. PHP has much better tools for this than regular expressions.
Oh and for good measure read Parsing Html The Cthulhu Way.
Maybe a completely different approach may work, too:
<base href="http://domain.com/" />
Trying to match HTML with regular expressions is very difficult.
Even though your code may seem to work, there is a good chance that some IMG tags will slip through as they are not in the exact format you have described.
This isn't tested, but I'm thinking something like this...
preg_match_all('/<img\b[^>]*\bsrc\s*=\s*[\'"]?([^\'">]*)/i', $content_text, $matches);
精彩评论