开发者

Improve Regex to match duplicates in a list

I am using a regex to find dupliates in a list. It is only a short comma seperated l开发者_运维百科ist, and performance is not an issue, so there is no need to tell me I should not use regex for those reasons.

// returns a match because some is repeated
"some,thing,here,some,whatever".match(/(^|,)(.+?)(,|,.+,)\2(,|$)/g)

Questions...

  1. Can this regex be improved?
  2. Does it cover all possible scenarios where comma is not in the seperated strings
  3. Is there a better (preferably more readable and more efficient) way to do this?


I don't see the purpose of using regexes here, unless you like unimaginable pain. If I had to find duplicates I would

  • Obtain an array of words

    var words = "...".split(',');
    
  • optionally lowercase everything, if you feel like doing that

  • sort the array

    words.sort()
    
  • Duplicates should now all be in consecutive positions of the array.

As an extra advantage, I`m pretty sure this would be vastly more efficient than a regex version.


If I wanted to find dups in a comma separated list, I'd do it like this, using the hash capabilities of an object to accumulate unique values and detect dups:

function getDups(list) {
    var data = list.split(",");
    var uniques = {}, dups = {}, item, uniqueList = [];
    for (var i = 0; i < data.length; i++) {
        item = data[i];
        if (uniques[item]) {
            // found dup
            dups[item] = true;
        } else {
            // found unique item
            uniques[item] = true;
        }
    }
    // at the end here, you'd have an object called uniques with all the unique items in it
    // you could turn that back into an array easily if you wanted to
    // Since it uses the object hash for dup detection, it scales to large numbers of items just fine
    // you can return whatever you want here (a new list, a list of dups, etc...)
    // in this implementation, I chose to return an array of unique values
    for (var key in uniques) {
        uniqueList.push(key);
    }
    return(uniqueList);    // return array of unique values
}

var list = "some,thing,here,some,whatever";
getDups(list);

Here's a jsFiddle that shows it working: http://jsfiddle.net/jfriend00/NGQCz/

This type of implementation scales well with large numbers of words because the dup detection is relatively efficient.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜