开发者

fixing link href and img src mismatch in a large number of html chunks saved in WordPress database

After using a caching plugin to fix numerous hotlinks, some of the generated html saved to the database is not quite right. For example:

<a href="http://www.mbird.com/wp-content/uploads/2011/04/psycho_blanket.jpg"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 164px; height: 251px;" src="http://www.mbird.com/wp-content/uploads/2011/04/psycho_blanket1.jpg" alt="" id="BLOGGER_PHOTO_ID_5306768463834252178" border="0"></a>

Other times there is an additional 2 before the extension. Other times there is a 21.

As you can see, the href and src don't agree. The href is right.

Suggestions for how to fix? I'm guessing I need to do a regex against linked images in post_content to test for this? I don't have much experience with regex in php, and need some help.

$posts = get_posts();

foreach( $posts as $post ) {

    // retrieve content of post; same as $post->post_content
    $content = $post['post_content'];

    // do stuff that I'm unsure about with $content to hone in on linked images with mismatched filenames and fix them

    // write it back开发者_如何学运维
    $post['post_content'] = '$content;

   // Update the post into the database
   wp_update_post( $my_post );
}


This tested regex solution should do it:

$re = '% # Match IMG wrapped in A element.
(<a\b[^>]+?href=")([^"]*)("[^>]*><img\b[^>]+?src=")([^"]*)("[^>]*></a>)
%ix';
$content = preg_replace($re, '$1$2$3$2$5', $content);

Given an IMG element wrapped inside an A element, this code replaces the SRC attribute of the IMG element with the HREF attribute of the A element. It assumes that all the HREF and SRC attribute values are wrapped in double quotes.


That's easily doable with regular expressions. But I would be lazy here and resort to phpQuery or QueryPath (it seems a one-time operation, so you don't need to watch out for performance):

$html = qp($content);

foreach ($html->find("a img") as $img) {

    $img->attr("src",
          $img->parent()->attr("href")
    );  // or maybe add some if checks here
}

$post["post_content"] = $html->top("body")->writeHTML();

Not tested. You might need a more specific selector than "a img" as well.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜