开发者

Find word in HTML

I am trying to find given word in HTML string and add a span around it.

What I am doing now is this:

function find(what:String,where:String)
{
    var regexp:RegExp=new 开发者_StackOverflow中文版RegExp(what,'gi');
    return where.replace(regexp,'<span>$&</span>');
}

It works well on words that are not inside HTML tags. What I want is to ignore those that are inside HTML tags.

Example: find("spain")

Input:

The rain in <b class="spain">Spain</b> stays mainly in the <i data-test="Spain">plain</i>.

Output:

The rain in <b class="spain"><span>Spain</span></b> stays mainly in the <i data-test="Spain">plain</i>.

How can I achieve this, please?


To account for html tags and attributes that could match, you are going to need to parse that HTML one way or another. The easiest way is to add it to the DOM (or just to a new element):

var container = document.createElement("div");
container.style.display = "none";
document.body.appendChild(container);  // this step is optional
container.innerHTML = where;

Once parsed, you can now iterate the nodes using DOM methods and find just the text nodes and search on those. Use a recursive function to walk the nodes:

function wrapWord(el, word)
{
    var expr = new RegExp(word, "i");
    var nodes = [].slice.call(el.childNodes, 0);
    for (var i = 0; i < nodes.length; i++)
    {
        var node = nodes[i];
        if (node.nodeType == 3) // textNode
        {
            var matches = node.nodeValue.match(expr);
            if (matches)
            {
                var parts = node.nodeValue.split(expr);
                for (var n = 0; n < parts.length; n++)
                {
                    if (n)
                    {
                        var span = el.insertBefore(document.createElement("span"), node);
                        span.appendChild(document.createTextNode(matches[n - 1]));
                    }
                    if (parts[n])
                    {
                        el.insertBefore(document.createTextNode(parts[n]), node);
                    }
                }
                el.removeChild(node);
            }
        }
        else
        {
            wrapWord(node, word);
        }
    }
}

Here's a working demo: http://jsfiddle.net/gilly3/J8JJm/3


You won't be able to process HTML in any reliable way using regex. Instead, parse the HTML into a DOM tree and iterate the Text nodes checking their data for content.

If you are using JavaScript in a web browser, the parsing will have already have been done for you. See this question for example wrap-word-in-span code. It's much trickier if you need to match phrases that might be split across different elements.


function find(what:String,where:String)
{
    what = what.replace(/(\[|\\|\^|\$|\.|\||\?|\*|\+|\(|\)|\{|\})/g, "\\$1")
          .replace(/[^a-zA-Z0-9\s:;'"~[\]\{\}\-_+=(),.<>*\/!@#$%^&|\\?]/g, "(?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<])")
          .replace(/</g,"&lt;?").replace(/>/g,"&gt;?").replace(/"/g,"(?:\"|&quot;?)")
          .replace(/\s/g, "(?:\\s|&nbsp;?)");

    what = "(>[^<]*|^[^<]*)(" + what + ")";
    var regexp:RegExp=new RegExp(what,'gi');
    return where.replace(regexp,'$1<span>$2</span>');
}
  1. The first replace function adds a backslash before characters which have a special meaning in a RE, to prevent errors or unexpected results.
  2. The second replace function replaces every occurrence of unknown characters in the search query by (?:&[0-9A-Za-z]{3,25};|&#[0-9]{1,10};?|[^\s<]). This RE consists of three parts: First, it tries to match a HTML entity. Second, it attempts to match a HTML numeric entity. Finally, it matches any non-whitespace character (in case the creator of the HTML document didn't properly encode the characters).
  3. The third, fourth and fifth replace functions replaces <, > and " by the corresponding HTML entities, so that the search query will not search through tags.
  4. The sixth replace function replaces white-space by a RE (\s|&nbsp;?), which match white-space characters and the HTML entity.

The only shortcoming of this function is that undocumented special characters (such as ) match any HTML entity/character (following the example, not only &euro; and are valid matches, but also &pound; and @).

This proposed solution suits in most cases. It can be inaccurate in complex situations, which is probably not worse than a DOM iteration (which is very susceptible to memory leaks and requires more computing power).

When you work with HTML elements which have Event listeners assigned through DOM, you should iterate through all (child) elements, and apply this function to every Text node.


  • Pure JavaScript (based on Sizzle.getText from jQuery); Demo: http://jsfiddle.net/vol7ron/U8LLv/

    var wrapText = function ( elems,regex ) {
        var re = new RegExp(regex);
        var elem;
    
        for ( var i = 0; elems[i]; i++ ) {
            elem = elems[i];
    
            // Get the text from text nodes and CDATA nodes
            if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
                parent = elem.parentNode;
                re.lastIndex = 0;
                if(re.test(elem.nodeValue)){               
                    var span = document.createElement('span');
                    span.innerHTML = RegExp.$1;
    
                    if (RegExp.leftContext != ''){
                       parent.insertBefore(document.createTextNode(RegExp.leftContext),elem);    i++;
                    }
    
                    parent.insertBefore(span,elem);   i++;
    
                    if (RegExp.rightContext != ''){
                       parent.insertBefore(document.createTextNode(RegExp.rightContext),elem);   i++;
                    }
    
                    parent.removeChild(elem);
                }                   
    
            // Traverse everything else, except comment nodes
            } else if ( elem.nodeType !== 8 ) {
                wrapText( elem.childNodes, regex );
            }
        }
    
        return;
    };
    
    
    var obj = document.getElementById('wrapper');
    wrapText([obj],/(spain)/gi);
    
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜