How can I get the text only (no tags) from a HTML document?
I have a HTML page, and I want the text only (all text nodes).
Example HTML
<span&开发者_运维知识库gt;hello <strong>sir</strong></span>
Desired Output
hello sir
Assuming you only want children of body
element...
Example HTML
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title> Example</title>
</head>
<body>
a <div>b<span>c</span></div>
</body></html>
JavaScript
var body = document.body;
var textContent = body.textContent || body.innerText;
console.log(textContent); // a bc
You need to check for textContent
because our good friend IE uses innerText
instead.
It is much easier if you have a library such as jQuery, i.e. $('body').text()
.
Also, it can be achieved on the server side, such as strip_tags()
in PHP. However, if you only wanted the body
element, you'd need to drill down to it using a DOM parser such as DOMDocument.
Assuming you are trying to get the html for the page your JS is residing on
var elems = document.getElementsByTagName('*');
var result = '';
for(var k in elems)
result += elems[k].innerHTML || '';
alert(result);
I am not sure I completely understand but if you want the markup for the current page then I guess you could make an Ajax request against the current page and use that:
$.get("/current-page-name", function(data) {
console.log(data);
});
http://jsfiddle.net/magicaj/CAWkx/
精彩评论