Why is this regular expression not working?
Content of 1.txt:
Image" href="images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"><img src="im
Code that does not work:
<?php
$pattern = '/(images\/product_im开发者_运维问答ages\/original_images\/)(.*)(\.jpg)/i';
$result = file_get_contents("1.txt");
preg_match($pattern,$result,$match);
echo "<h3>Preg_match Pattern test:</h3><br><br><pre>";
print_r($match);
echo "</pre>";
?>
I expect this result:
Array
(
[0] => images/product_images/original_images/9961_1.jpg
[1] => images/product_images/original_images/
[2] => 9961_1
[3] => .jpg
)
But i take this-like:
Array
(
[0] => images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false">
[1] => images/product_images/original_images/
[2] => 9961_1.jpg" rel="disable-zoom:false; disable-expand: false">
)
I'n tired of trying from a million combinations of this regexp. I dunno what's wrong. Please and thanks a lot!
Make it ungreedy:
$pattern = '/(images\/product_images\/original_images\/)(.*?)(\.jpg)/i';
Remember that Regular Expressions are greedy. Your second capture (.*)
says to match any character except the new line (unless in mutliline mode). So it is probably capturing the rest of the line.
You can make it ungreedy as suggested by Wrikken. But I like to ensure I am capturing what I want. In your case, it looks like the value of the href
attribute. So really I want at least 1 character, can't be a quote, followed by the jpg extension:
$pattern = '/(images\/product_images\/original_images\/)([^'"]+)(\.jpg)/i';
Here's the basic regex:
href="((.*/)(.*?)(.jpg))"
Do not parse HTML with regex.
Do not parse HTML with regex.
Do not parse HTML with regex.
精彩评论