awk how to remove duplicates in a field except for some specific strings
This is the structure of my csv file:
Oslo Company1 Mission1
Oslo Company1 Mission2
Oslo Company3 Missionspecial
Oslo Companyspecial Missionspecial
Paris Company2 Mission1
Paris Companyspecial Mission2
Paris Company3 Missionspecial
I want to delete all duplicates in fields 1,2,3 and replace them with blanks, except for those special strings "Companyspecial" "Missionspecial" so that the output is:
Oslo Company1 Mission1
Mission2
Company3 Missionspecial
Companyspecial Missionspecial
Paris Company2
Companyspecial
Missionspecial
All I know to do is remove all duplicate开发者_JS百科s with this bit of code:
x[$1]++ {$1=""}x[$2]++ {$2=""}x[$3]++ {$3=""}){print $1,$2,$3,et.....}
I'm no programmer. Help would be greatly appreciated, will save hours of stupid slave work! Thank you much in advance!``
awk '{
for(i=1;i<=3;i++)
if($i !~ /(Mission|Company)special/)
if(a[i,$i]++)
$i=""
printf("%-12s%-19s%-s\n",$1,$2,$3)
}'
Proof of concept HERE
Edit
Updated code to reflect concerns about one field's text potentially removing another. I accomplish this by changing a[$i]++
to a[i,$i]++
so that each field's text is also tied to the field number.
精彩评论