开发者

Regexp for html [duplicate]

This question already has answers here: Closed 12 years ago.

Possible Duplicate:

RegEx match open tags except XHTML self-contained tags

I have the following string:

$str = " 
<li>r</li>  
<li>a</li>  
<li>n</li>  
<li>d</li>  
...
<li>om</li>  
";

How do I get the HTML for the first 开发者_高级运维n-th <li> tags?

Ex : n = 3 ; result = "<li>r<...>n</li>;

I would like a regexp if possible.


Like this.

$dom = new DOMDocument();
@$dom->loadHTML($str);
$x = new DOMXPath($dom); 

// we wan the 4th node.
foreach($x->query("//li[4]") as $node) 
{
  echo $node->c14n()
}

Oh yeah, learn xpath, it will save you lots of trouble in the future.


The Solution of @Byron but with SimpleXML:

$xml = simplexml_load_string($str);

foreach($xml->xpath("//li[4]") as $node){
  echo $node[0]; // The first element is the text node
}

EDIT: Another reason I really like at simplexml is the easy debugging of the content of a node. You can just use print_r($xml) to print the object with it's child nodes.


As I'm sure you are aware it is not a good idea to use regular expressions to work through HTML unless you were to "tidy" it first.

A very viable solution in PHP would be to navigate the HTML structure using Simple XML (http://php.net/manual/en/book.simplexml.php) or as a DOM Document (http://php.net/manual/en/class.domdocument.php).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜