How do I retrieve all text in an HTML DOM but exclude SCRIPT and STYLE tags?
I know how to quickly extract text nodes from a DOM:
document.evaluate('//text()', document, null, XPathResult.ANY_TYPE, null)
But is there an easy way to exclude text from SCRIPT, STYLE, or other tags that are not shown to the user?
Something like:
'//text()[ parent.name not in ("SCRIPT", "STYLE") ]'
Thanks, M开发者_StackOverflowike
//*[not(self::script or self::style)]/text()
Besides Nick Jones correct answer, for more complex exclusion you should use XPath node set exclusion expression:
$ns1[not(count(.|$ns2)=count($ns2))]
In this case:
//*[not(count(.|//script|/*/*/style)=count(//script|/*/*/style))]/text()
精彩评论