开发者

extracting only unique row from csv based on multiple column criteria?

basically i have rows after rows of data

event | venue | date | some | other | columns | here|

there are lot of overlapping event that happens at same venue and date.

so i would like to end up with only unique events by matching event, venue, date.

is this possible in excel , how to approach开发者_运维技巧 this ? VBA ? or some option i dont know about ?

thank you.

UPDATE:

i just went to data->remove duplicates


Using TxtSushi you could run Jerry's select directly on a CSV file. So:

tssql -table event_table events.csv \
'select event, venue, date 
    from event_table
    group by event, venue, date'

I advertise it as a *NIX tool but it works on windows too.


The easiest way would probably be to import the data into a database instead. At that point, a SQL query (for example) would be something like:

select event, venue, date 
    from event_table
    group by event, venue, date

Regardless of what you think of it as a spreadsheet, Excel is not really a good database manager.


if you are on *nix, you can use awk

awk -F"|" '{a[$1$2$3]=$0}END{ for(i in a) print i,a[i]}' csv


Great answers so far.

Learn a scripting language.
Any scripting language.

One way of doing what you want in Python:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

def main():
    FIN = 'in.csv'
    with open(FIN, 'r') as fin:
        col_name_reader = csv.reader(fin)

        # assuming the first line of your csv file has column names
        col_names = col_name_reader.next()
        csv_dict_reader = csv.DictReader(fin, col_names)

        uniq_events = set(tuple((row['event'], row['venue'], row['date']) 
            for row in csv_dict_reader))

        print uniq_events

if __name__ == "__main__":
    main()

Using a test file populated as follows:

event,venue,date
an_event,a_venue,2010-01-01
an_event,a_venue,2010-01-01
an_event,a_venue,2010-01-01
another_event,another_venue,2010-01-02
another_event,another_venue,2010-01-02

We get:

set([('an_event', 'a_venue', '2010-01-01'),
     ('another_event', 'another_venue', '2010-01-02')])

Best of luck!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜