How do i match content between particular all <li> tags?
How do I match all the <li> tags in the below HTML code:
<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>
This expression doesn't work:
<li>(.*)</li>
Because it returns:
some content</li>
<li> some other content</li>
<li> some other other content.
W开发者_C百科hich is the content between the first <li> and the last </li>
Regular expressions are greedy by nature. Make it non-greedy by adding the ?
.
<li>(.*?)</li>
Note: I'd encourage a DOM Parser for such a thing. Check out PHP's DOMDocument.
Someone please link the Regex HTML Parser question...
There is a reason HTML parsers exist, which is to parse HTML.
This solution is a bit long, but it is versatile and works for elements with classes, ids, etc:
<?php
function innerHTML($node) {
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";
$document = new DOMDocument();
$document->loadHTML($string);
$ul = $document->getElementsByTagName("ul");
foreach ($ul as $element) {
print innerHTML($element);
}
?>
It seems like you don't need the tag names. Try this simpler code:
<?php
$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";
$document = new DOMDocument();
$document->loadHTML($string);
$ul = $document->getElementsByTagName("li");
foreach ($ul as $element) {
print $element->nodeValue;
}
?>
Try to use .*?
rather than .*
- it is lazy or non-greedy match and matches as little as possible.
Response to @CanSpice:
Of course regex is not suited for HTML. OP should try something like <li>(?!.*<li>).*?</li>
depending on what he is doing. OR rather use a parser. I can only direct the OP one step at a time
Try to make the Regexp non-greedy
<li>(.*?)</li>
Since you are matching HTML text I would suggest atleast use s
and i
flags like this:
'~<li>(.*?)</li>~is'
- s is for DOTALL to make dot
.
match all the characters including new line - i is for ignore case matching
<?php
$str = '<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>';
preg_match_all('/<li>([^<]+)</li>/i', $str, $r); print_r($r[1]); ?>
Output:
`Array ( [0] => some content [1] => some other content [2] => some other other content. ) `
var a = '<ul>'+
'<li> some content</li>'+
'<li> some other content</li>'+
'<li> some other other content.</li>'+
'</ul>'
a.split("<li>")
gives
["<ul>", " some content</li>", " some other content</li>", " some other other content.</li></ul>"]
From there we can pick whatever we want.
精彩评论