Regex syntax question - trying to understand
I'm a self taught PHP p开发者_如何学Pythonrogrammer and I'm only now starting to grasp the regex stuff. I'm pretty aware of its capabilities when it is done right, but this is something I need to dive in too. so maybe someone can help me, and save me so hours of experiment.
I have this string:
here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some' /></a> and there is <a href="#not">not</a> a chance...
now, I need to preg_match
this string and search for the a href
tag that has an image in it, and replace it with the same tag with a small difference: after the title attribute inside the tag, I'll want to add a rel="here"
attribute.
of course, it should ignore links (a href
's) that don't have img
tag inside.
First of all: never ever ever use regex for html!
You're much better off using an XML parser: create a DOMDocument, load your HTML, and then use XPath to get the node you want.
Something like this:
$str = 'here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($doc);
$results = $xpath->query('//a/img');
foreach ($results as $result) {
// edit result node
}
$doc->saveHTML();
Ideally you should use HTML (or XML) parser for this purpose. Here is an example using PHP built-in XML manipulation functions:
<?php
error_reporting(E_ALL);
$doc = new DOMDocument();
$doc->loadHTML('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>');
$xpath = new DOMXPath($doc);
$result = $xpath->query('//a[img]');
foreach ($result as $r) {
$r->setAttribute('rel', $r->getAttribute('title')); // i am confused whether you want a hard-coded "here" or the value of the title
}
echo $doc->saveHTML();
Output
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here" rel="here"><img src="http://www.somewhere.com/1.png" alt="some"></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>
here a couple of link that might help you with Regex:
RegEx Tutorial
Email Samples of RegEx
I used the web site in the last link extensively in my previous Job. It is a great collections of RegEx that you can also test according to your specific case. First two links would help you to find to get some further knowledge about it.
精彩评论