开发者

Simple HTML DOM Parser - Skip certain element

I am using the Simple HTML DOM Parser and I want to completely ignore the contents of the "nested" element and get the contents of the proceeding "pre" element.

<div id=parent>

<div class="nested">
<pre>Text that I want ignored</pre>
</div>

<pre>
This is the text I want to access
</pre>
</div>

I don't have control of the HTML source, and the owner has recently added the "nested" element. Before I accessed the content I needed by doing so:

$page_contents = file_get_html($url);    
$div_content = $page_contents->find('div[id=parent]pre', 0)->i开发者_如何转开发nnertext;

But obviously the new nested element has broken my method.

I can't seem to find any official documentation regarding this kind of scenario.


not tested but try this

$div_content = $page_contents->find('div[id=parent][class!=nested]pre', 0)->innertext;

or

$div_content = $page_contents->find('div[id=parent class!=nested]pre', 0)->innertext;

or maybe even just this I think this is really the one but again I have not tested

$div_content = $page_contents->find('div[class!=nested]pre', 1)->innertext;

still don't know if this will work but try this

$div_content = $page_contents->find('div[class!=nested pre]', 0)->innertext;

or

$div_content = $page_contents->find('div[class!=nested pre]', 0)->plaintext;


find('div[id=parent] pre') finds all pre tags in specified div and doesnt care if one of them is enclosed in another div, so heres a few suggestions:

if you know exactly which pre you want to get, just specify the number counting from zero, in your case:

$div_content = $page_contents->find('div[id=parent] pre', 1)->innertext;

in case you dont know how many pre are there, or dont know the order, you could just remove the one you dont want and then do the previous line, but this time specifying number 0:

$page_contents->find('div[id=parent] div[id=nested] pre', 0)->outertext = '';
$div_content = $page_contents->find('div[id=parent] pre', 0)->innertext;

and in case you dont want to change $page_contents, just assign your parent div to a temporary variable, and do like above:

$temp = $page_contents->find('div[id=parent]', 0);
$temp->find('div[id=nested] pre', 0)->outertext='';
$div_content = $temp->find('pre', 0)->innertext;

ofcourse there are a lot of other ways to do this, should read the manual http://simplehtmldom.sourceforge.net/manual.htm though it mentions just the main features, a lot more is under the hood

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜