Finding Duplicates (Regex)

2023-01-17 23:32 问答作者：

I have a CSV containing list of 500 members wit开发者_StackOverflow中文版h their phone numbers. I tried diff tools but none can seem to find duplicates.

Can I use regex to find duplicate rows by members' phone numbers?

I'm using Textmate on Mac.

Many thanks

What duplicates are you searching for? The whole lines or just the same phone number?

If it is the whole line, then try this:

sort phonelist.txt | uniq -c | sort -n

and you will see at the bottom all lines, that occur more than once.

If it is just the phone number in some column, then use this:

awk -F ';' '{print $4}' phonelist.txt | uniq -c | sort -n

replace the '4' with the number of the column with the phone number and the ';' with the real separator you are using in your file.

Or give us a few example lines from this file.

EDIT:

If the data format is: name,mobile,phone,uniqueid,group, then use the following:

awk -F ',' '{print $3}' phonelist.txt | uniq -c | sort -n

in the command line.

Yes. For one way to do it, look here. But you would probably not want to do it this way.

You can normally parse this file, and check what rows are duplicated. I think RAGEX is a worst solution for this problem.

What language are you using? In .NET, with little effort you could load the CSV file in to a DataTable and find/remove the duplicate rows. Afterwards, write your DataTable back to another CSV file.

Heck, you can load this file in to Excel and sort by a field and find the duplicates manually. 500 isn't THAT many.

use PERL.

Load the CSV file into an array, and match the column you want to check (phone numbers) for duplicates, then store the values into another array, then check for duplicates in that array, using:

my %seen;
my @unique = grep !$seen{$_}++, @array2;

After that, all you need to do is load the unique array(phone numbers) into a for loop, and inside it load array#1(lines) into a for loop. Compare the phone number in the unique array, and if it matches, output that line into another csv file.

继续阅读：csv regex

Finding Duplicates (Regex)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？