Using Javascript and Regular expression to get content inside the html body [duplicate]
Possible Duplicate:
how to extract body contents using regexp
I have response text which is having a full page content like html,head,bo开发者_运维技巧dy.I want only the content inside the body.How to achieve this using regx.please help to achieve this.
A DOM parser is the most reliable method for extracting data like this, but a regex can do a pretty decent job if the HTML is sane. (i.e. the text: <body
or: </body
does not occur inside comments, scripts, stylesheets, CDATA sections or attribute values. And the BODY element start tag attributes do not contain the: >
character.) This regex captures the contents of the first innermost BODY element (should only ever be one):
var bodytext = '';
var m = text.match(/<body[^>]*>([^<]*(?:(?!<\/?body)<[^<]*)*)<\/body\s*>/i);
if (m) bodytext = m[1];
It implements Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique so is quite fast.
精彩评论