JavaScript - Efficiently find all elements containing one of a large set of strings

2022-12-27 18:58 问答作者：

I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently:

String is all or part of an attribute. e.g., the string is foo: <input value="foo"> -> Add class ATTR to the element.
String is the full text of an element. e.g., <button>foo</button> -> Add class TEXT to the element.
String is inline in the text of an element. e.g., <p>I love foo</p> -> Wrap the text in a span tag with class TEXT.

Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>.

The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go.

For the attributes, I can do something like this:

sortedStrings.each(function(it) {
     document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) {
        $('[+attr+'*='+it+']').addClass('ATTR');
     });
});

Again, that seems inefficient.

Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient.

Any answer that offers performance improvements gets an upvote :)

EDIT: I went with a modification on Bob's answer. delim is an optional delimiter around the string (to differentiate it from normal text), and keys is the list of strings.

function dfs(iterator,scope) {
    scope = scope || document.body;
    $(scope).children().each(function() {
        return dfs(iterator,this);
    });
    return iterator.call(scope);
}

var escapeChars = /['\/.*+?|()[\]{}\\]/g;
function safe(text) { 
    return text.replace(escapeChars, '\\$1');
}

function eachKey(iterator) {
    var key, lit, i, len, exp;
    for(i = 0, len = keys.length; i < len; i++) {
        key = keys[i].trim();
        lit = (delim + key + delim);
        exp = new RegExp(delim + '(' + safe(key) + ')' + delim,'g');            
        iterator(key,lit,exp);
  开发者_Go百科  }
}

$(function() {
    keys = keys.sort(function(a,b) {
        return b.length - a.length;
    });

    dfs(function() {
        var a, attr, html, val, el = $(this);
        eachKey(function(key,lit,exp) {
            // check attributes
            for(a in el[0].attributes) {
                attr = el[0].attributes[a].nodeName;
                val = el.attr(attr);
                if(exp.test(val)) {
                    el.addClass(attrClass);
                    el.attr(attr,val.replace(exp,"$1"));
                }
            }
            // check all content
            html = el.html().trim();
            if(html === lit) {
                el.addClass(theClass);
                el.html(key); // remove delims
            } else if(exp.test(html)) {
                // check partial content
                el.html(html.replace(exp,wrapper));
            }
        });
    });
});

Under the assumption that the traversal is the most expensive operation, this seems optimal, although improvements are still welcome.

Trying to parse HTML with regex is a mug's game. It simply can't handle even the basic strucures of HTML, never mind the quirks. There's so much wrong with your snippet already. (Doesn't detect unquoted attributes; fails for a wide variety of punctuation in it due to lack of HTML-escaping, regex-escaping or CSS-escaping(*); failure for attributes with - in; strange non-use of replace...)

So, use the DOM. Yes, that'll mean a traversal. But then so does a selector like the [attr*=] you're using already.

var needle= 'foo';

$('*').each(function() {
    var tag= this.tagName.toLowerCase();
    if (tag==='script' || tag==='style' || tag==='textarea' || tag==='option') return;

    // Find text in attribute values
    //
    for (var attri= this.attributes.length; attri-->0;)
        if (this.attributes[attri].value.indexOf(needle)!==-1)
            $(this).addClass('ATTR');

    // Find text in child text nodes
    //
    for (var childi= this.childNodes.length; childi-->0;) {
        var child= this.childNodes[childi];
        if (child.nodeType!=3) continue;

        // Sole text content of parent: add class directly to parent
        //
        if (child.data==needle && element.childNodes.length===1) {
            $(this).addClass('TEXT');
            break;
        }

        // Else find index of each occurence in text, and wrap each in span
        //
        var parts= child.data.split(needle);
        for (var parti= parts.length; parti-->1;) {
            var span= document.createElement('span');
            span.className= 'TEXT';
            var ix= child.data.length-parts[parti].length;
            var trail= child.splitText(ix);
            span.appendChild(child.splitText(ix-needle.length));
            this.insertBefore(span, trail);
        }
    }
});

(The reverse-loops are necessary as this is a destructive iteration of content.)

(*: escape doesn't do any of those things. It's more like URL-encoding, but it's not really that either. It's almost always the wrong thing; avoid.)

There is really no good way to do this. Your last requirement makes you have to traverse the entire dom.

for the first 2 requirements i would select all elements by tag name, and interate over them inserting the stuff as needed.

only performance improvement i can think of is to do this on the server side at all costs, this may even mean an extra post to have your faster server do the work, otherwise this can be really slow on say, IE6

继续阅读：javascript jquery performance

JavaScript - Efficiently find all elements containing one of a large set of strings

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？