fixing link href and img src mismatch in a large number of html chunks saved in WordPress database
After using a caching plugin to fix numerous hotlinks, some of the generated html saved to the database is not quite right. For example:
<a href="http://www.mbird.com/wp-content/uploads/2011/04/psycho_blanket.jpg"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 164px; height: 251px;" src="http://www.mbird.com/wp-content/uploads/2011/04/psycho_blanket1.jpg" alt="" id="BLOGGER_PHOTO_ID_5306768463834252178" border="0"></a>
Other times there is an additional 2 before the extension. Other times there is a 21.
As you can see, the href and src don't agree. The href is right.
Suggestions for how to fix? I'm guessing I need to do a regex against linked images in post_content to test for this? I don't have much experience with regex in php, and need some help.
$posts = get_posts();
foreach( $posts as $post ) {
// retrieve content of post; same as $post->post_content
$content = $post['post_content'];
// do stuff that I'm unsure about with $content to hone in on linked images with mismatched filenames and fix them
// write it back开发者_如何学运维
$post['post_content'] = '$content;
// Update the post into the database
wp_update_post( $my_post );
}
This tested regex solution should do it:
$re = '% # Match IMG wrapped in A element.
(<a\b[^>]+?href=")([^"]*)("[^>]*><img\b[^>]+?src=")([^"]*)("[^>]*></a>)
%ix';
$content = preg_replace($re, '$1$2$3$2$5', $content);
Given an IMG element wrapped inside an A element, this code replaces the SRC attribute of the IMG element with the HREF attribute of the A element. It assumes that all the HREF and SRC attribute values are wrapped in double quotes.
That's easily doable with regular expressions. But I would be lazy here and resort to phpQuery or QueryPath (it seems a one-time operation, so you don't need to watch out for performance):
$html = qp($content);
foreach ($html->find("a img") as $img) {
$img->attr("src",
$img->parent()->attr("href")
); // or maybe add some if checks here
}
$post["post_content"] = $html->top("body")->writeHTML();
Not tested. You might need a more specific selector than "a img"
as well.
精彩评论