Improve Regex to match duplicates in a list
I am using a regex to find dupliates in a list. It is only a short comma seperated l开发者_运维百科ist, and performance is not an issue, so there is no need to tell me I should not use regex for those reasons.
// returns a match because some is repeated
"some,thing,here,some,whatever".match(/(^|,)(.+?)(,|,.+,)\2(,|$)/g)
Questions...
- Can this regex be improved?
- Does it cover all possible scenarios where comma is not in the seperated strings
- Is there a better (preferably more readable and more efficient) way to do this?
I don't see the purpose of using regexes here, unless you like unimaginable pain. If I had to find duplicates I would
Obtain an array of words
var words = "...".split(',');
optionally lowercase everything, if you feel like doing that
sort the array
words.sort()
Duplicates should now all be in consecutive positions of the array.
As an extra advantage, I`m pretty sure this would be vastly more efficient than a regex version.
If I wanted to find dups in a comma separated list, I'd do it like this, using the hash capabilities of an object to accumulate unique values and detect dups:
function getDups(list) {
var data = list.split(",");
var uniques = {}, dups = {}, item, uniqueList = [];
for (var i = 0; i < data.length; i++) {
item = data[i];
if (uniques[item]) {
// found dup
dups[item] = true;
} else {
// found unique item
uniques[item] = true;
}
}
// at the end here, you'd have an object called uniques with all the unique items in it
// you could turn that back into an array easily if you wanted to
// Since it uses the object hash for dup detection, it scales to large numbers of items just fine
// you can return whatever you want here (a new list, a list of dups, etc...)
// in this implementation, I chose to return an array of unique values
for (var key in uniques) {
uniqueList.push(key);
}
return(uniqueList); // return array of unique values
}
var list = "some,thing,here,some,whatever";
getDups(list);
Here's a jsFiddle that shows it working: http://jsfiddle.net/jfriend00/NGQCz/
This type of implementation scales well with large numbers of words because the dup detection is relatively efficient.
精彩评论