Parse a CSV file extracting some of the values but not all
Good day,
I have a local csv file with values that change daily called DailyValues.csv
I need to extract the value field of category2 and category4. Then combine, sort and remove duplicates (if any) from the extracted values. Then save it to a new local file NewValues.txt.Here is an example of the DailyValues.csv file:
category,date,value
category1,2010-05-18,value01
category1,2010-05开发者_StackOverflow中文版-18,value02
category1,2010-05-18,value03
category1,2010-05-18,value04
category1,2010-05-18,value05
category1,2010-05-18,value06
category1,2010-05-18,value07
category2,2010-05-18,value08
category2,2010-05-18,value09
category2,2010-05-18,value10
category2,2010-05-18,value11
category2,2010-05-18,value12
category2,2010-05-18,value13
category2,2010-05-18,value14
category2,2010-05-18,value30
category3,2010-05-18,value16
category3,2010-05-18,value17
category3,2010-05-18,value18
category3,2010-05-18,value19
category3,2010-05-18,value20
category3,2010-05-18,value21
category3,2010-05-18,value22
category3,2010-05-18,value23
category3,2010-05-18,value24
category4,2010-05-18,value25
category4,2010-05-18,value26
category4,2010-05-18,value10
category4,2010-05-18,value28
category4,2010-05-18,value11
category4,2010-05-18,value30
category2,2010-05-18,value31
category2,2010-05-18,value32
category2,2010-05-18,value33
category2,2010-05-18,value34
category2,2010-05-18,value35
category2,2010-05-18,value07
I've found some helpful parsing examples at http://www.php.net/manual/en/function.fgetcsv.php and managed to extract all the values of the value column but don't know how to restrict it to only extract the values of category2/4 then sort and clean duplicate.
The solution needs to be in php, perl or shell script.
Any help would be much appreciated.
Thank you in advance.Here's a shell script solution.
egrep 'category4|category2' input.file | cut -d"," -f1,3 | sort -u > output.file
I used the cut
command just to show you that you can extract certain columns only, since the f
switch for cut chooses, which columns you want to extract.
The u
switch for sort makes the output to be unique.
Edit:
It's important that you use egrep
and not grep
, since grep
uses a somewhat restricted regular expression set, and egrep has somewhat further facilities
Edit (for people who only have grep available):
grep 'category2' input.file > temp.file && grep 'category4' input.file >> temp.file && cut temp.file -d"," -f1,3 | sort -u > output.file && rm temp.file
It produces quite an overhead but still works...
精彩评论