How do i match content between particular all <li> tags?
How do I match all the <li> tags in the below HTML code:
<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>
This expression doesn't work:
<li>(.*)</li>
Because it returns:
some content</li>
<li> some other content</li>
<li> some other other content.
W开发者_C百科hich is the content between the first <li> and the last </li>
Regular expressions are greedy by nature. Make it non-greedy by adding the ?.
<li>(.*?)</li>
Note: I'd encourage a DOM Parser for such a thing. Check out PHP's DOMDocument.
Someone please link the Regex HTML Parser question...
There is a reason HTML parsers exist, which is to parse HTML.
This solution is a bit long, but it is versatile and works for elements with classes, ids, etc:
<?php
function innerHTML($node) {
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";
$document = new DOMDocument();
$document->loadHTML($string);
$ul = $document->getElementsByTagName("ul");
foreach ($ul as $element) {
print innerHTML($element);
}
?>
It seems like you don't need the tag names. Try this simpler code:
<?php
$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";
$document = new DOMDocument();
$document->loadHTML($string);
$ul = $document->getElementsByTagName("li");
foreach ($ul as $element) {
print $element->nodeValue;
}
?>
Try to use .*? rather than .* - it is lazy or non-greedy match and matches as little as possible.
Response to @CanSpice:
Of course regex is not suited for HTML. OP should try something like <li>(?!.*<li>).*?</li> depending on what he is doing. OR rather use a parser. I can only direct the OP one step at a time
Try to make the Regexp non-greedy
<li>(.*?)</li>
Since you are matching HTML text I would suggest atleast use s and i flags like this:
'~<li>(.*?)</li>~is'
- s is for DOTALL to make dot
.match all the characters including new line - i is for ignore case matching
<?php
$str = '<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>';
preg_match_all('/<li>([^<]+)</li>/i', $str, $r); print_r($r[1]); ?>
Output:
`Array
(
[0] => some content
[1] => some other content
[2] => some other other content.
)
`var a = '<ul>'+
'<li> some content</li>'+
'<li> some other content</li>'+
'<li> some other other content.</li>'+
'</ul>'
a.split("<li>")
gives
["<ul>", " some content</li>", " some other content</li>", " some other other content.</li></ul>"]
From there we can pick whatever we want.
加载中,请稍侯......
精彩评论