开发者

Javascript Regex for matching whole words

This is a follow-up que开发者_如何学运维stion to this one

Since javascript regex is much different from .net regex (which I'm used to), I can't seem to figure out how to enhance this regex.

Here's the current pattern:

var pattern = new RegExp('\\b' + filter[i] + '\\b', 'g');

This works great when the phrase stands alone but if it's located in an anchor tag, the method ends up removing the entire anchor (which is not desirable).

Example

<body>
    This is my text. It's an ass of a time in class
    <a href="http://example.com/1234/ass-hole">ass-hole</a>
</body>

shows up as

<body> This is my text. It's an *** of a time in class ***-hole </body>

in the DOM

What I want it to look like is

<body>
    This is my text. It's an *** of a time in class
    <a href="http://example.com/1234/***-hole">***-hole</a>
</body>


It looks like $('body').text(function (i, txt) { ... }); is giving you the inner text of the body element in one big block, with all of the tags already removed. In other words, your regex is not removing tags, but $('body').text is.

It sounds like you actually want to loop over descendant child text nodes of the body. I'm not familiar with jQuery, perhaps it has another function that does this for you, but if it doesn't, you can use this one:

function allTextNodes(parent) {

    function getChildNodes(parent) {
        var x, out = [];
        for (x = 0; x < parent.childNodes.length; x += 1) {
            out[x] = parent.childNodes[x];
        }

        return out;
    }

    var cursor, closed = [], open = getChildNodes(parent);

    while (open.length) {
        cursor = open.shift();
        if (cursor.nodeType === 1) {
            open.unshift.apply(open, getChildNodes(cursor));
        }
        if (cursor.nodeType === 3) {
            closed.push(cursor);
        }
    }

    return closed;
}

Using that function (or one like it), try this usage instead:

(function () {
    var x, i, re, rep,
        nodes = allTextNodes(document.body),
        filter = [ 'some', 'words', 'go', 'here' ];

    for (x = 0; x < nodes.length; x += 1) {
        for (i = 0; i < filter.length; i += 1) {
            re = new RegExp('\\b' + filter[i] + '\\b', 'g');
            rep = '****'; // fix this
            if (re.test(nodes[x].nodeValue)) {
                nodes[x].nodeValue = nodes[x].nodeValue.replace(re, rep);
            }
        }
    }
}());

Food for thought: what will happen if you have a filter word that contains a character that has meaning inside a regex? It seems unlikely in this case, but you should consider it all the same.


There's no way that Regex can be used to remove what you claim it removed. The problem is that the input isn't what you claim it is. If you add

alert(txt);

to your function, you'll see that you're actually passing

This is my text. It's an ass of a time in class ass-hole

to it. This is the body's text. Perhaps you want its innerHTML.

Next time, please post a minimal, runnable demonstration of the problem up front. It's really bad when you say you have a problem doing a substitution, and the code doesn't perform any substitution.


The problem here is because your matching \b on either side with as a word. This means it is required to be surrounded by certain characters, and '>' is not one of them.

So in your code, you need to change your regex to allow for '>' to exist on the left side and probably '<' to exist on the right.

var pattern = new RegExp('(\b | >)' + filter[i] + '(\b | <)', 'g');

Is probably pretty close to what you need.

The real javascript REGEXP can be found here: http://www.javascriptkit.com/javatutors/redev2.shtml

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜