counting words with domDocument class
How can i counting the words in a html page, with domDocument?
for example, if the input is something like:
<div> Hello something open. <a href="open.php"&g开发者_JS百科t;click</a>
lorem ipsum <a href="open.php">here></a>
the output:
Number Word 1 Hello 2 something 3 open 4 click 5 lorem 6 ipsum 7 here.And what if i need only the linktext?
click 4 here 7If you need this for the entire document, it is likely easier to just strip_tags
and then run str_word_count
on the result.
If you have to do this with DOM, you can do
$str = <<< HTML
<div> Hello something open. <a href="open.php">click</a>
lorem ipsum <a href="open.php">here></a></div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//text()');
$textNodeContent = '';
foreach($nodes as $node) {
$textNodeContent .= " $node->nodeValue";
}
print_r(str_word_count( $textNodeContent, 1 ));
Using text()
as the XPath expression will only give you the textnodes in the document. You can limit this to just return the link texts with //a/text()
for the expression.
精彩评论