Regex to extract images from HTML - how to get only JPGs?
I am using this PHP function to grab all <img>
tags within any given HTML.
function extract_images($content)
{
$img = strip_tags(html_entity_decode($content),'<img>');
$regex = '~src="[^"]*"~';
开发者_运维知识库
preg_match_all($regex, $img, $all_images);
return $all_images;
}
This works and returns all images (gif, png, jpg, etc).
Anyone know how to change the regex...
~src="[^"]*"~
in order to only get files with JPG or JPEG extension?
Thanks a bunch.
Sooner or later the Regex Enforcement Agency will show up. It might as well be me :)
The proper way to do this is with a proper HTML DOM parser. Here's a DOMDocument
solution. The usefulness of this is in that it's more robust than parsing the HTML by regex, and also gives you the ability to access or modify other HTML attributes on your <img>
nodes at the same time.
$dom = new DOMDocument();
$dom->loadHTML($content);
// To hold all your links...
$links = array();
// Get all images
$imgs = $dom->getElementsByTagName("img");
foreach($imgs as $img) {
// Check the src attr of each img
$src = "";
$src = $img->getAttribute("src");
if (preg_match("/\.jp[e]?g$/i", $src) {
// Add it onto your $links array.
$links[] = $src;
}
See other answers for the simple regex solution, or adapt from the regex inside my foreach loop.
/src="[^"]*\.(jpg|jpeg)"/i
i -> case insensitive match
精彩评论