Javascript Regex for matching whole words

2023-02-28 16:10 问答作者：

This is a follow-up que开发者_如何学运维stion to this one

Since javascript regex is much different from .net regex (which I'm used to), I can't seem to figure out how to enhance this regex.

Here's the current pattern:

var pattern = new RegExp('\\b' + filter[i] + '\\b', 'g');

This works great when the phrase stands alone but if it's located in an anchor tag, the method ends up removing the entire anchor (which is not desirable).

Example

<body>
    This is my text. It's an ass of a time in class
    <a href="http://example.com/1234/ass-hole">ass-hole</a>
</body>

shows up as

<body> This is my text. It's an *** of a time in class ***-hole </body>

in the DOM

What I want it to look like is

<body>
    This is my text. It's an *** of a time in class
    <a href="http://example.com/1234/***-hole">***-hole</a>
</body>

It looks like $('body').text(function (i, txt) { ... }); is giving you the inner text of the body element in one big block, with all of the tags already removed. In other words, your regex is not removing tags, but $('body').text is.

It sounds like you actually want to loop over descendant child text nodes of the body. I'm not familiar with jQuery, perhaps it has another function that does this for you, but if it doesn't, you can use this one:

function allTextNodes(parent) {

    function getChildNodes(parent) {
        var x, out = [];
        for (x = 0; x < parent.childNodes.length; x += 1) {
            out[x] = parent.childNodes[x];
        }

        return out;
    }

    var cursor, closed = [], open = getChildNodes(parent);

    while (open.length) {
        cursor = open.shift();
        if (cursor.nodeType === 1) {
            open.unshift.apply(open, getChildNodes(cursor));
        }
        if (cursor.nodeType === 3) {
            closed.push(cursor);
        }
    }

    return closed;
}

Using that function (or one like it), try this usage instead:

(function () {
    var x, i, re, rep,
        nodes = allTextNodes(document.body),
        filter = [ 'some', 'words', 'go', 'here' ];

    for (x = 0; x < nodes.length; x += 1) {
        for (i = 0; i < filter.length; i += 1) {
            re = new RegExp('\\b' + filter[i] + '\\b', 'g');
            rep = '****'; // fix this
            if (re.test(nodes[x].nodeValue)) {
                nodes[x].nodeValue = nodes[x].nodeValue.replace(re, rep);
            }
        }
    }
}());

Food for thought: what will happen if you have a filter word that contains a character that has meaning inside a regex? It seems unlikely in this case, but you should consider it all the same.

There's no way that Regex can be used to remove what you claim it removed. The problem is that the input isn't what you claim it is. If you add

alert(txt);

to your function, you'll see that you're actually passing

This is my text. It's an ass of a time in class ass-hole

to it. This is the body's text. Perhaps you want its innerHTML.

Next time, please post a minimal, runnable demonstration of the problem up front. It's really bad when you say you have a problem doing a substitution, and the code doesn't perform any substitution.

The problem here is because your matching \b on either side with as a word. This means it is required to be surrounded by certain characters, and '>' is not one of them.

So in your code, you need to change your regex to allow for '>' to exist on the left side and probably '<' to exist on the right.

var pattern = new RegExp('(\b | >)' + filter[i] + '(\b | <)', 'g');

Is probably pretty close to what you need.

The real javascript REGEXP can be found here: http://www.javascriptkit.com/javatutors/redev2.shtml

继续阅读：javascript regex

Javascript Regex for matching whole words

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？