开发者

How to use PHP Simple HTML DOM Parser to find the not hyper linked text

I want to parse html to a dom tree, and find all the开发者_开发问答 text NOT inside the <a> tags, so, I googled it, and found "PHP Simple HTML DOM Parser". It seems it can help me to parse the HTML DOM to a DOM Tree. I would like to find the text NOT inside <a> tags, but I only can find the element which is inside <a> tag. *ps: it don't support the CSS3 not selector yet. Thank you. Any one experience on this? Thank you.


I hope I'm not misunderstanding the question, but can't you use the built-in DOM functions for PHP to find the text inside the <a> tags?

$doc = new DOMDocument();
$doc->loadHTMLFile("http://blahblah.com/blah.html");
$elem_list = $doc->getElementsByTagName("a");
foreach($elem_list as $elem)
    echo $elem->textContent;

In that case I would remove all <a> tags and their contents (for example with regular expressions) and then load the resulting HTML into your DOM parser of choice.

Update: Even better, immediately parse the HTML and use the built-in functions to remove the <a> tags, or loop through all tags and just skip the <a> tags. Regex with HTML should be avoided.


I have used this class many times. Its an excellent solution to parse html/dom in php.

$html = new simple_html_dom();
// Load your html as string
$html->load('........ HTML ..........');
$a = $html->find('a');
$text='';
for($i=0;$i<count($a);$i++)
$text.=$a[$i]->innertext;

variable $text containing all the text in a tags. Hope it will help you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜