开发者

Javascript regex oddity in Chrome

I freely admit that my comprehension of regular expressions is spotty. That said, I can't make head or tail of this. This only happens in Chrome.

I have this bit of code to pull out the text between body tags in an HTML string:

var extractBodyHtml = function (obj) {
    var regex = /<body.*?>([\s\S]*?)<\/body>/g;
    //if (obj.match(regex)) {
    if (regex.test(obj)) {
        return RegExp.$1;
    } else {
        return obj;
    }
};

Update

I cannot reproduce this in a fiddle. In fact the exact same code works in one place, against the same HTML, but not another. Lest you think I am crazy here's the debugger.

Javascript regex oddity in Chrome

(source: outsharked.com)

Note the commented line. That was the first version. It worked, sometimes. In 开发者_StackOverflow社区other situations, RegExp.$1 would return just a single character, "r". This is always reproducible for a particular situation.

Note that obj.match(regex) always returns the correct match (including the body tags) but accessing the backreference would give the "r" sometimes.

When I changed the code to regex.test(obj) things always work correctly, and RegExp.$1 returns the inner content.

What am I doing wrong?


You should (almost) never use a regular expression to parse html.

Whatever response you get from your AJAX requests, you can pass it to jQuery's constructor (if it's valid html). You can then parse it with jQuery's regular methods:

$.get('path/to/html', function(data){
    // "data" will hold your entire html returned
    var theHTML = $(data).find('body').html(); // this'll have what you're looking for
});
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜