Javascript regex oddity in Chrome
I freely admit that my comprehension of regular expressions is spotty. That said, I can't make head or tail of this. This only happens in Chrome.
I have this bit of code to pull out the text between body
tags in an HTML string:
var extractBodyHtml = function (obj) {
var regex = /<body.*?>([\s\S]*?)<\/body>/g;
//if (obj.match(regex)) {
if (regex.test(obj)) {
return RegExp.$1;
} else {
return obj;
}
};
Update
I cannot reproduce this in a fiddle. In fact the exact same code works in one place, against the same HTML, but not another. Lest you think I am crazy here's the debugger.
(source: outsharked.com)Note the commented line. That was the first version. It worked, sometimes. In 开发者_StackOverflow社区other situations, RegExp.$1
would return just a single character, "r". This is always reproducible for a particular situation.
Note that obj.match(regex)
always returns the correct match (including the body tags) but accessing the backreference would give the "r" sometimes.
When I changed the code to regex.test(obj)
things always work correctly, and RegExp.$1
returns the inner content.
What am I doing wrong?
You should (almost) never use a regular expression to parse html.
Whatever response you get from your AJAX requests, you can pass it to jQuery's constructor (if it's valid html). You can then parse it with jQuery's regular methods:
$.get('path/to/html', function(data){
// "data" will hold your entire html returned
var theHTML = $(data).find('body').html(); // this'll have what you're looking for
});
精彩评论