sed or awk - deleting strings between patterns
I have a CSV file with lines like this:
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC.DDD,C-name,num1,num2,num3
EEE.FFF.GGGG,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
...
Some lines have one identifier (like AAA); some have two (like CCC); some have three or more (like EEE). And some identifiers are not three characters. I need to remove all but the first identifier from each line of 开发者_运维技巧the line (such that the first period and anything that comes after it is deleted until the first comma is encountered), producing this:
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH,H-name,num1,num2,num3
...
I've tried a few pattern-replace methods but am getting tripped up. Does anyone have the syntax I need?
sed 's/^\([^.]\{1,\}\)[^,]*/\1/'
Just remove everything between a dot and the first colon. For the file
$ cat foo
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC.DDD,C-name,num1,num2,num3
EEE.FFF.GGGG,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
use this sed command:
$ sed 's/\.[^,]*//' foo
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH,num1,num2,num3
However, it will remove an H
at the last line. This seems to be a typo in your example, however.
Using perl
$ perl -pe 's/\.[A-Z.]*?,/,/' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
sed
$ sed 's/\.[A-Z.]*,/,/' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
and awk
$ awk '/\./{sub(/\.[A-Z.]*,/, ",", $0)}{print}' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
精彩评论