Find word in HTML

2023-04-04 07:55 问答作者：

I am trying to find given word in HTML string and add a span around it.

What I am doing now is this:

function find(what:String,where:String)
{
    var regexp:RegExp=new 开发者_StackOverflow中文版RegExp(what,'gi');
    return where.replace(regexp,'<span>$&</span>');
}

It works well on words that are not inside HTML tags. What I want is to ignore those that are inside HTML tags.

Example: find("spain")

Input:

The rain in <b class="spain">Spain</b> stays mainly in the <i data-test="Spain">plain</i>.

Output:

The rain in <b class="spain"><span>Spain</span></b> stays mainly in the <i data-test="Spain">plain</i>.

How can I achieve this, please?

To account for html tags and attributes that could match, you are going to need to parse that HTML one way or another. The easiest way is to add it to the DOM (or just to a new element):

var container = document.createElement("div");
container.style.display = "none";
document.body.appendChild(container);  // this step is optional
container.innerHTML = where;

Once parsed, you can now iterate the nodes using DOM methods and find just the text nodes and search on those. Use a recursive function to walk the nodes:

function wrapWord(el, word)
{
    var expr = new RegExp(word, "i");
    var nodes = [].slice.call(el.childNodes, 0);
    for (var i = 0; i < nodes.length; i++)
    {
        var node = nodes[i];
        if (node.nodeType == 3) // textNode
        {
            var matches = node.nodeValue.match(expr);
            if (matches)
            {
                var parts = node.nodeValue.split(expr);
                for (var n = 0; n < parts.length; n++)
                {
                    if (n)
                    {
                        var span = el.insertBefore(document.createElement("span"), node);
                        span.appendChild(document.createTextNode(matches[n - 1]));
                    }
                    if (parts[n])
                    {
                        el.insertBefore(document.createTextNode(parts[n]), node);
                    }
                }
                el.removeChild(node);
            }
        }
        else
        {
            wrapWord(node, word);
        }
    }
}

Here's a working demo: http://jsfiddle.net/gilly3/J8JJm/3

You won't be able to process HTML in any reliable way using regex. Instead, parse the HTML into a DOM tree and iterate the Text nodes checking their data for content.

If you are using JavaScript in a web browser, the parsing will have already have been done for you. See this question for example wrap-word-in-span code. It's much trickier if you need to match phrases that might be split across different elements.

function find(what:String,where:String)
{
    what = what.replace(/(\[|\\|\^|\$|\.|\||\?|\*|\+|\(|\)|\{|\})/g, "\\$1")
          .replace(/[^a-zA-Z0-9\s:;'"~[\]\{\}\-_+=(),.<>*\/!@#$%^&|\\?]/g, "(?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<])")
          .replace(/</g,"&lt;?").replace(/>/g,"&gt;?").replace(/"/g,"(?:\"|&quot;?)")
          .replace(/\s/g, "(?:\\s|&nbsp;?)");

    what = "(>[^<]*|^[^<]*)(" + what + ")";
    var regexp:RegExp=new RegExp(what,'gi');
    return where.replace(regexp,'$1<span>$2</span>');
}

The first replace function adds a backslash before characters which have a special meaning in a RE, to prevent errors or unexpected results.
The second replace function replaces every occurrence of unknown characters in the search query by (?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<]). This RE consists of three parts: First, it tries to match a HTML entity. Second, it attempts to match a HTML numeric entity. Finally, it matches any non-whitespace character (in case the creator of the HTML document didn't properly encode the characters).
The third, fourth and fifth replace functions replaces <, > and " by the corresponding HTML entities, so that the search query will not search through tags.
The sixth replace function replaces white-space by a RE (\s| ?), which match white-space characters and the HTML entity.

The only shortcoming of this function is that undocumented special characters (such as €) match any HTML entity/character (following the example, not only &euro; and € are valid matches, but also £ and @).

This proposed solution suits in most cases. It can be inaccurate in complex situations, which is probably not worse than a DOM iteration (which is very susceptible to memory leaks and requires more computing power).

When you work with HTML elements which have Event listeners assigned through DOM, you should iterate through all (child) elements, and apply this function to every Text node.

Pure JavaScript (based on Sizzle.getText from jQuery); Demo: http://jsfiddle.net/vol7ron/U8LLv/

var wrapText = function ( elems,regex ) {
    var re = new RegExp(regex);
    var elem;

    for ( var i = 0; elems[i]; i++ ) {
        elem = elems[i];

        // Get the text from text nodes and CDATA nodes
        if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
            parent = elem.parentNode;
            re.lastIndex = 0;
            if(re.test(elem.nodeValue)){               
                var span = document.createElement('span');
                span.innerHTML = RegExp.$1;

                if (RegExp.leftContext != ''){
                   parent.insertBefore(document.createTextNode(RegExp.leftContext),elem);    i++;
                }

                parent.insertBefore(span,elem);   i++;

                if (RegExp.rightContext != ''){
                   parent.insertBefore(document.createTextNode(RegExp.rightContext),elem);   i++;
                }

                parent.removeChild(elem);
            }                   

        // Traverse everything else, except comment nodes
        } else if ( elem.nodeType !== 8 ) {
            wrapText( elem.childNodes, regex );
        }
    }

    return;
};


var obj = document.getElementById('wrapper');
wrapText([obj],/(spain)/gi);

继续阅读：javascript regex

Find word in HTML

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？