extracting only unique row from csv based on multiple column criteria?
basically i have rows after rows of data
event | venue | date | some | other | columns | here|
there are lot of overlapping event that happens at same venue and date.
so i would like to end up with only unique events by matching event, venue, date.
is this possible in excel , how to approach开发者_运维技巧 this ? VBA ? or some option i dont know about ?
thank you.
UPDATE:
i just went to data->remove duplicates
Using TxtSushi you could run Jerry's select directly on a CSV file. So:
tssql -table event_table events.csv \ 'select event, venue, date from event_table group by event, venue, date'
I advertise it as a *NIX tool but it works on windows too.
The easiest way would probably be to import the data into a database instead. At that point, a SQL query (for example) would be something like:
select event, venue, date
from event_table
group by event, venue, date
Regardless of what you think of it as a spreadsheet, Excel is not really a good database manager.
if you are on *nix, you can use awk
awk -F"|" '{a[$1$2$3]=$0}END{ for(i in a) print i,a[i]}' csv
Great answers so far.
Learn a scripting language.
Any scripting language.
One way of doing what you want in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
def main():
FIN = 'in.csv'
with open(FIN, 'r') as fin:
col_name_reader = csv.reader(fin)
# assuming the first line of your csv file has column names
col_names = col_name_reader.next()
csv_dict_reader = csv.DictReader(fin, col_names)
uniq_events = set(tuple((row['event'], row['venue'], row['date'])
for row in csv_dict_reader))
print uniq_events
if __name__ == "__main__":
main()
Using a test file populated as follows:
event,venue,date an_event,a_venue,2010-01-01 an_event,a_venue,2010-01-01 an_event,a_venue,2010-01-01 another_event,another_venue,2010-01-02 another_event,another_venue,2010-01-02
We get:
set([('an_event', 'a_venue', '2010-01-01'),
('another_event', 'another_venue', '2010-01-02')])
Best of luck!
精彩评论