Improve Regex to match duplicates in a list

2023-04-05 09:59 问答作者：

I am using a regex to find dupliates in a list. It is only a short comma seperated l开发者_运维百科ist, and performance is not an issue, so there is no need to tell me I should not use regex for those reasons.

// returns a match because some is repeated
"some,thing,here,some,whatever".match(/(^|,)(.+?)(,|,.+,)\2(,|$)/g)

Questions...

Can this regex be improved?
Does it cover all possible scenarios where comma is not in the seperated strings
Is there a better (preferably more readable and more efficient) way to do this?

I don't see the purpose of using regexes here, unless you like unimaginable pain. If I had to find duplicates I would

Obtain an array of words
```
var words = "...".split(',');
```
optionally lowercase everything, if you feel like doing that
sort the array
```
words.sort()
```
Duplicates should now all be in consecutive positions of the array.

As an extra advantage, I`m pretty sure this would be vastly more efficient than a regex version.

If I wanted to find dups in a comma separated list, I'd do it like this, using the hash capabilities of an object to accumulate unique values and detect dups:

function getDups(list) {
    var data = list.split(",");
    var uniques = {}, dups = {}, item, uniqueList = [];
    for (var i = 0; i < data.length; i++) {
        item = data[i];
        if (uniques[item]) {
            // found dup
            dups[item] = true;
        } else {
            // found unique item
            uniques[item] = true;
        }
    }
    // at the end here, you'd have an object called uniques with all the unique items in it
    // you could turn that back into an array easily if you wanted to
    // Since it uses the object hash for dup detection, it scales to large numbers of items just fine
    // you can return whatever you want here (a new list, a list of dups, etc...)
    // in this implementation, I chose to return an array of unique values
    for (var key in uniques) {
        uniqueList.push(key);
    }
    return(uniqueList);    // return array of unique values
}

var list = "some,thing,here,some,whatever";
getDups(list);

Here's a jsFiddle that shows it working: http://jsfiddle.net/jfriend00/NGQCz/

This type of implementation scales well with large numbers of words because the dup detection is relatively efficient.

继续阅读：javascript regex

Improve Regex to match duplicates in a list

Questions...

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Questions...

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？