开发者

Regex syntax question - trying to understand

I'm a self taught PHP p开发者_如何学Pythonrogrammer and I'm only now starting to grasp the regex stuff. I'm pretty aware of its capabilities when it is done right, but this is something I need to dive in too. so maybe someone can help me, and save me so hours of experiment.

I have this string:

here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some' /></a> and there is <a href="#not">not</a> a chance... 

now, I need to preg_match this string and search for the a href tag that has an image in it, and replace it with the same tag with a small difference: after the title attribute inside the tag, I'll want to add a rel="here" attribute. of course, it should ignore links (a href's) that don't have img tag inside.


First of all: never ever ever use regex for html!

You're much better off using an XML parser: create a DOMDocument, load your HTML, and then use XPath to get the node you want.

Something like this:

$str = 'here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($doc);
$results = $xpath->query('//a/img');
foreach ($results as $result) {
    // edit result node
}
$doc->saveHTML();


Ideally you should use HTML (or XML) parser for this purpose. Here is an example using PHP built-in XML manipulation functions:

<?php
error_reporting(E_ALL);
$doc = new DOMDocument();
$doc->loadHTML('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>');
$xpath = new DOMXPath($doc);
$result = $xpath->query('//a[img]');
foreach ($result as $r) {
    $r->setAttribute('rel', $r->getAttribute('title')); // i am confused whether you want a hard-coded "here" or the value of the title
}
echo $doc->saveHTML();

Output

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here" rel="here"><img src="http://www.somewhere.com/1.png" alt="some"></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>


here a couple of link that might help you with Regex:

RegEx Tutorial

Email Samples of RegEx

I used the web site in the last link extensively in my previous Job. It is a great collections of RegEx that you can also test according to your specific case. First two links would help you to find to get some further knowledge about it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜